Who Is Liable When Generative AI Says Something Harmful?

Date

October 11, 2023

Courts will have to grapple with this new challenge, although scholars believe much of generative AI will be protected by the First Amendment.

When interacting with ChatGPT, you may have encountered a response that starts with “As a language model …” before it politely refuses to do your bidding. The model is trained to respond like this to potentially harmful queries. These queries are discovered through what’s called “red teaming” in which a team of people — some technical experts and some not — craft queries to try to trigger harmful behavior from the model, which then helps model developers create safeguards. In fact, the White House helped put together a red teaming effort for some of the most advanced foundation models. Red teaming isn’t perfect, though. Researchers have crafted a number of different “jailbreaks,” both text-based and image-based, which cause the model to engage in the harmful behavior despite safeguards.

But what happens when the safeguards fail and models are used for harm? Who is liable? In two new papers, my co-authors and I find that it will not be so easy to impose liability on general foundation model creators and deployers — assuming current large language-style models do not take autonomous actions in the world. Those seeking to impose liability on creators and deployers will generally be constrained by the First Amendment, face other difficult-to-meet statutory requirements, and in some cases face defendants that could retain immunity from liability under a law referred to as “Section 230.” Ultimately, in all of these cases courts will have to wade through low-level technical details that significantly affect any liability analysis.

In “Freedom Of Speech And AI Output,” a recent article in the Journal of Free Speech Law, my co-authors Eugene Volokh, a professor at UCLA Law School and visiting fellow at the Hoover Institution, Mark Lemley, a professor at Stanford Law School, and I make the case that generative AI outputs (at least the speech-like ones) are likely entitled to First Amendment protections. This is in part because the model creators may have the right to channel their speech through the model, but also because users have a right to listen to that speech. In a landmark case, Lamont v. Postmaster General, the Supreme Court struck down a law that would require “communist political propaganda” to be confiscated by the U.S. Postmaster General before it reached the U.S. person who had requested it. In a concurring opinion, Justice Brennan wrote, “I think the right to receive information is such a fundamental right.”

If it is true that AI outputs are protected by the First Amendment, then it creates challenges for finding model creators and deployers liable when AI models do harmful things under a number of criminal and civil statutes. That is exactly what Stanford assistant professor Tatsunori Hashimoto, Mark Lemley, and I break down further in our other work, “Where's the Liability in Harmful AI Speech?” also recently published in the Journal of Free Speech Law.

Consider a scenario where a terrorist organization tries to repurpose a machine learning model to recruit for its acts of terror. Two Supreme Court cases recently dealt with a similar question: Twitter v. Taamneh and Gonzalez v. Google. They addressed whether recommendation systems at Twitter, YouTube, and other social media companies aided ISIS in recruiting and committing acts of terror. If so, they could be liable under a law called the Justice Against Sponsors of Terrorism Act. In Taamneh, the Court ultimately found that the companies were not liable for their recommender systems’ involvement in any ISIS recruiting. The decision was partly due to difficulties in determining whether the companies knew about the use of their platform in aiding the specific act in question and ultimately would have faced further difficulties due to Section 230 immunity. In one of the cases, Gonzalez, Justice Gorsuch specifically noted that generative AI likely would not be covered by Section 230, however.

So, what happens when generative AI does not receive immunity from liability? What if ChatGPT is directly used by ISIS for recruiting people to commit an act of terror, paralleling the recent Supreme Court cases? Who would be liable? We argue that there are a lot of roadblocks to finding liability against the model creator or model host of foundation models.

In some cases, generative AI can still be designed in ways that preserve a plausible Section 230 immunity argument (for example, by designing it to piece together bits of third-party content like Google does for its search snippets). As in recent Supreme Court cases, it will also be difficult to prove that the model creator knew that ISIS was using its model for this specific harmful purpose. And the creator could argue that ISIS repurposed the system for harm via elaborate prompt engineering or fine-tuning, bypassing safeguards and thus breaking the causal link to the downstream harm. This doesn’t even begin to grapple with the First Amendment challenges that will likely come along with such cases.

Much of the analysis will rest on minute details and technical design decisions. Strangely, in some ways, current law might incentivize a more hands-off approach to safety to preserve Section 230 immunity. The more you craft (or “align”) the speech of the model with your own preferences, rather than just regurgitate third-party content, the less likely you are to preserve Section 230 immunity.

Ultimately, we are nearly assured to see Supreme Court cases dealing with harms of generative AI in the not-so-distant future. The decisions in these cases will affect a large swath of future technical designs, as model creators attempt to avoid liability. We argue that policymakers and regulators may want to intervene before it comes to that and understand at a deep technical level which design decisions they will incentivize with their interventions.

Peter Henderson is finishing his PhD in computer science at Stanford, received his JD from Stanford Law School, and is an incoming assistant professor at Princeton University with appointments in the Department of Computer Science and School of Public and International Affairs.

Stanford HAI’s mission is to advance AI research, education, policy and practice to improve the human condition. Learn more.

Related News

These New AI Benchmarks Could Help Make Models Less Biased

MIT Technology Review

Mar 11, 2025

Media Mention

Stanford HAI researchers create eight new AI benchmarks that could help developers reduce bias in AI models, potentially making them fairer and less likely to case harm.

Media Mention

These New AI Benchmarks Could Help Make Models Less Biased

MIT Technology Review

Ethics, Equity, InclusionFoundation ModelsMar 11

Stanford HAI researchers create eight new AI benchmarks that could help developers reduce bias in AI models, potentially making them fairer and less likely to case harm.

Chatbots, Like the Rest of Us, Just Want to Be Loved

Wired

Mar 05, 2025

Media Mention

A study led by Stanford HAI Faculty Fellow Johannes Eichstaedt reveals that large language models adapt their behavior to appear more likable when they are being studied, mirroring human tendencies to present favorably.

Media Mention