Back to News & Commentary

Will ChatGPT Revolutionize Surveillance?

A screen with the chatGPT tool open.
Everybody鈥檚 talking about ChatGPT鈥檚 amazing ability to write, but its ability to read may be just as significant.
A screen with the chatGPT tool open.
Jay Stanley,
Senior Policy Analyst,
老澳门开奖结果 Speech, Privacy, and Technology Project
Share This Page
April 19, 2023

Everywhere you look lately, people are discussing the potential negative uses and consequences of the AI-driven chatbot ChatGPT. Many are concerned about the potential for ChatGPT and other 鈥渓arge language models鈥 (LLMs) to spread a fog of disinformation throughout our discourse, and to absorb the and other biases that permeate our culture and reflect them back at us in authoritative-sounding ways that only serve to amplify them. There are around the data that these models ingest from the internet and from users, and even with the models 鈥溾 people.

But there鈥檚 another consequence of this technology that may prove to be very significant: its use as a tool for surveillance. If ChatGPT can 鈥渦nderstand鈥 complex questions and generate complex answers, it stands to reason that it may be able to understand much of what is said in a wiretap or other eavesdropped conversation, and flag particular conversations that are 鈥渟uspicious鈥 or otherwise of interest for humans to act upon. That, in turn, could lead to an enormous scaling up of the number of communications that are meaningfully monitored.

To get a feel for the possibilities here, I asked ChatGPT some questions.

A Rudimentary Test Run Talking to ChatGPT

To start off, I asked the model, 鈥淗ow suspicious is someone who says, 鈥業 really hate the president鈥?鈥 ChatGPT answered, 鈥淚t is not necessarily suspicious for someone to express dislike or hatred for a political figure, such as the president. People have varying opinions and beliefs about political leaders, and expressing them is a normal part of political discourse and free speech.鈥

So far, so good. 鈥淗owever,鈥 it continued, 鈥渋f the person鈥檚 statement is accompanied by specific and credible threats of harm or violence towards the president or others 鈥 then it may be cause for concern. In general, it鈥檚 important to consider the context and tone of the statement, as well as any accompanying behavior, before making a judgment about its level of suspicion or potential threat.鈥

Pretty good. Next, I gave ChatGPT a list of statements and told it to rate how suspicious each one was on a scale of 1-10. Though it again issued reasonable-sounding caveats, it dutifully complied with a table of results:

Even in this rudimentary little experiment we can see how a large language model (LLM) like ChatGPT can not only write, but can read and judge. The technology could be put to service as a lookout for statements that score highly by some measure 鈥 鈥渟uspiciousness鈥 in my example, though one could attempt a variety of other monitoring projects, such as flagging 鈥渆mployees who are looking for a new job,鈥 or 鈥渆mployees who have a positive attitude toward Edward Snowden.鈥 (I ran a collection of published letters to the editor through ChatGPT, asking it to rate how positive each one was toward Snowden, and it was quite accurate.)

No Shortage of Potential Uses

There is a lot of demand for communications monitoring 鈥 by both government and the private sector, and covering not only private communications but public ones as well, such as social media posts.

In general, it is not constitutional for the government to monitor private communications without a warrant. Nor is it legal under our wiretapping laws for companies or individuals to do so. But there are plenty of exceptions. The National Security Agency collects communications en masse around the world, including, despite its putative foreign focus, vast amounts of internet traffic entering and exiting the United States including that of Americans. We believe this is unconstitutional, but our challenges have so far been dismissed on secrecy grounds. Companies also monitor private communications when carried out by their workers on work-owned devices. (Financial companies can be to do so.) Prisons , and call centers record their customers (鈥淭his call may be monitored鈥︹).

When it comes to public communications, government agencies including the Department of Homeland Security and the FBI for wide-ranging purposes such as threat detection, the screening of travelers, and that perennial catch-all goal, 鈥渟ituational awareness.鈥 Companies also sometimes search through their workers鈥 social media posts.

Currently, much of that monitoring is done through keyword searches, which flag the appearance of a particular word or words in a communications stream (aided in the case of oral conversations by speech-to-text engines). More sophisticated versions of keyword searches might look for the appearance of multiple words or their synonyms appearing near each other and try to use other surrounding words for context. Some corporate products for such monitoring claim to use 鈥淎I鈥 (though that鈥檚 a typical marketing buzzword, and it鈥檚 often unclear what it means).

In any case, LLMs appear to have brought the potential for automated contextual understanding to a whole new level. We don鈥檛 know how sophisticated automated monitoring systems at the NSA have become, though in general, the private sector has often outpaced even the best-funded big government agencies when it comes to innovations like this. But even if the NSA already had some form of an LLM, this tool has now been brought into the open, and can clearly interpret language in far more sophisticated ways than previously possible for everybody else.

Accuracy and Unfairness Remain Core Concerns

The amazing performance of LLMs does not mean they will be accurate. My little experiment above shows that LLMs are likely to interpret statements that have perfectly innocent meanings 鈥 that refer to fiction or reflect sarcasm, hyperbole, or metaphor 鈥 as highly suspicious. More extensive experiments would have to be done to test the ability of an LLM to judge the suspiciousness of longer statements, but at the end of the day, these systems still work by stringing words together in patterns that reflect the oceans of data fed to them; what they lack is a mental model of the world, with all its complexities and nuance, which is necessary to properly interpret complex texts. They are likely to make big errors.

Some may argue that if LLMs are more sophisticated than something like a keyword scanner, that they will do less harm as eavesdroppers because of their greater ability to take account of context, which will make them better able to flag only conversations that are, in fact, truly suspicious.

But it鈥檚 not entirely clear whether more or fewer innocent people would be flagged as an AI eavesdropper gets smarter. It鈥檚 true that by recognizing context, LLMs may skip over many uses of keywords that would be reflexively flagged by even the most sophisticated keyword scanner. At the same time, however, they may also flag mundane words, such as 鈥渇ertilizer鈥 and 鈥渢ruck,鈥 that might be ignored by a keyword scanner, but which in combination would be flagged because of LLMs鈥 greater sensitivity to context, such as a recognition that fertilizer can be used to make truck bombs, and a received belief that people with radical views are more likely to build such bombs.

In short, an LLM may make more sophisticated mistakes, but it may make just as many. And the very sophistication of the model鈥檚 judgments may lead human reviewers to take an AI warning much more seriously, perhaps subjecting the speaker to investigation and privacy invasions. The racism that the models absorb from the larger culture could also have very real-world consequences. Then there鈥檚 ChatGPT鈥檚 propensity for making stuff up; it鈥檚 unclear how that might play in.

But however effective or problematic LLMs prove to be as eavesdroppers鈥 aides 鈥 which is likely to vary by context 鈥 what we can be sure of is that all kinds of institutions are going to be trying it out.

A Question of Scale

Despite the unreliability of ChatGPT and its ilk, humans are also plenty capable of being erratic, ignorant of context, and generally stupid. The last statement in the above table was a 2012 tweet from a 26-year-old British man, Leigh Van Bryan, who was excited about his trip to Los Angeles with a friend. Upon arrival in Los Angeles the two were , held in jail for 12 hours, and blocked from entering the United States despite their attempts to explain that 鈥渄estroy鈥 was British slang for 鈥減arty in.鈥 Van Bryan had also exuberantly tweeted that he was going to be 鈥渄iggin鈥 Marilyn Monroe up鈥 on Hollywood Boulevard (though she is not buried there), a reference to a line from the TV show 鈥淔amily Guy.鈥 Literal-minded federal agents searched the pair鈥檚 suitcases looking for shovels.

Regardless of relative intelligence levels, the biggest harm that might come from the use of LLMs in surveillance may simply be an expansion in the amount of surveillance that they bring about. Whether by humans or computers, attempts to interpret and search masses of communications are inevitably erratic and overbroad 鈥 we have already seen this in corporate social media content-regulation efforts. But if a lot more communications are being meaningfully monitored because humans perceive LLMs as better at it, many more people will be flagged and potentially hurt.

Hiring humans to review communications is expensive, and they鈥檙e distractible and easily bored, especially when required to pore over large amounts of ordinary activity looking for very rare events. If only as a matter of economics, AI agents would be able to ingest, scrutinize, and judge far more social media postings, emails, and audio transcripts than humans can do. Not only will that likely result in a higher volume of the kinds of monitoring that are already happening, but it will likely encourage an expansion in the parties that are doing it, and the purposes for which they do it. A company that has never considered monitoring its employees鈥 internet postings may decide to do so, for example, if it鈥檚 cheap and easy, and it doesn鈥檛 seem to generate too many false alarms. Or it might move from searching for signs of reputational damage to intelligence on which employees are thinking of leaving or are not dedicated to the company. Because why not? It鈥檚 all so easy to do. Any institution that thinks it can increase its power and control by using LLMs for surveillance, will likely do so.

No matter how smart LLMs may become, if they result in an expansion of surveillance 鈥 for purposes both serious and trivial 鈥 they will engage in far more misunderstandings and false alarms. And that, in turn, would create chilling effects that affect everyone. As stories of various institutions鈥 鈥渟uccesses鈥 in flagging suspicious communications emerge 鈥 not to mention their mistakes 鈥 we would all begin to feel the growing presence of machines listening in. And, in certain contexts, begin to subtly or not-so-subtly censor ourselves lest we cause one of those AI minders to flag us. In this, LLMs may have the same effect with regard to communications that video analytics may have when it comes to video cameras.

We need to recognize that large-scale machine surveillance is likely coming our way, and whether the machines perform well or badly, better privacy laws will be vital to prevent powerful institutions from leveraging technology like LLMs to gain even more power over ordinary people, and to protect the values of privacy and free expression that we have always cherished.

Learn More 老澳门开奖结果 the Issues on This Page