How can we know if a housing website is suggesting the same homes to home-seekers of different races, or illegally steering some users toward neighborhoods where they demographically 鈥渂elong鈥? Or whether employment websites are showing qualified women鈥檚 resumes to employers at the same rate as men鈥檚?
Because such websites tend to make decisions that are automated by code that is proprietary and hidden, members of the public wouldn鈥檛 know unless someone tested the outcomes produced by these algorithms. Researchers and journalists want to do this testing to inform the debate about online business practices, just as they have in the offline world.
Unfortunately, they conduct these investigations in the shadow of a federal criminal called the Computer Fraud and Abuse Act, which perversely grants businesses that operate online the power to shut down any testing of their practices they don鈥檛 like. Intended to punish malicious hacking, the CFAA contains broad and vague language making it a crime to access a website in a manner that 鈥渆xceeds authorized access.鈥 This provision has been interpreted to prohibit an individual from visiting a website in a manner that violates the website鈥檚 terms of service. But common website terms of service prohibit activities like copying publicly available information (鈥渟craping鈥), creating multiple accounts, or providing false information 鈥 even though these activities are often necessary for robust testing, including the kind of testing that would uncover discrimination on the internet.
We are challenging this provision in federal court on behalf of a group of academic researchers and The Intercept, a media organization. The lawsuit seeks to remove the barrier posed by the CFAA鈥檚 overbroad criminal prohibitions. In the meantime, here is some advice for journalists and researchers doing this important work. You can read our full paper on this subject here.
First, do no harm.
To avoid liability, journalists should design their investigations to avoid placing too much stress on the target鈥檚 computers or servers. The idea is to ensure, to the extent possible, that the servers continue to function as they would without the investigation. Conducting a careful investigation makes it less likely that the target company can argue damage to its machines or its regular business operations.
Practically speaking, this means, for example, designing software to make a small number of requests repeatedly over a long period of time, rather than overwhelming a server by running all of the requests at once. Journalists should also consider running bots and scrapers at off-hours, when servers are not likely to be experiencing much traffic, though this may be impossible with some services (like trip or route planners) that are highly sensitive to the time of day they are tested. Finally, investigations that trigger real-world events 鈥 for example, hailing a car service or reserving lodging 鈥 should be limited in scope.
Does fear of negative publicity protect you?
Imagine: A data journalist working for a major publication conducts an investigation that reveals that a platform operated by a large and publicly traded company systematically disadvantages women or people of color in some way. When the platform gets wind of the investigation, it sues the journalist and the publication claiming damages from the test. How would the company look to the public when news of this retaliatory suit got out?
In recent years, many technology companies have been sensitive to allegations of discrimination and to any publicity that makes them look like bad actors. This sensitivity might offer data journalists some protection. Exactly how much will depend on the footprint of the journalist and publication involved, the size and corporate culture of the target, and the extent to which its business is public-facing and reliant on the trust of its consumer base. It will also depend on the details of the investigation 鈥 the more newsworthy the topic, the more protection a journalist may have. For example, an investigation into gender discrimination in job recruiting may generate widespread interest and more protection from public attention.
Consider informing the investigated entity.
Researchers might consider seeking permission for testing from the entities they want to investigate. If a target grants permission, that would preclude any argument that the testing activities violated the CFAA鈥檚 authorization provisions. However, if the targeted entity refuses permission, a researcher may find herself in a worse legal position than before if she goes ahead with the research. (There may be, of course, other downsides to seeking permission, including the possibility that the targeted entity makes it technologically impossible to conduct the proposed research.)
Mount a defense based on civil rights enforcement.
If a journalist conducting research into algorithmic discrimination is alleged to have accessed, copied, or published information obtained through falsity or deception, she could raise the defense that the online testing was the equivalent of offline testing long approved by the courts. Courts , in the context of fair housing, that testers are necessary for enforcement, even though they are not genuinely interested in the housing they claim to seek during the test. Courts have even acknowledged that deception is involved in testing, and nonetheless have permitted it. As one appellate court :
鈥淚t is surely regrettable that testers must mislead commercial landlords and home owners as to their real intentions. . . Nonetheless, we have long recognized that this requirement of deception was a relatively small price to pay to defeat racial discrimination. The evidence produced by testers . . . is a major resource in society鈥檚 continuing struggle to eliminate the subtle but deadly poison of racial discrimination.鈥
Congress passed a ensuring that the federal government directly funds testing related to fair housing issues. Testing has similarly been recognized by some as a vital part of the enforcement of anti-discrimination laws in employment. The more closely an online audit test resembles these offline tests, the more persuasive this argument will likely be to a court.
Journalists and researchers can find more on how to protect themselves while conducting online investigations here.