Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
Foundation Models and Copyright Questions | Stanford HAI

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
    • Subscribe to Email
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
policyPolicy Brief

Foundation Models and Copyright Questions

Date
November 02, 2023
Topics
Foundation Models
Regulation, Policy, Governance
Read Paper
abstract

This brief warns that fair use may not fully shield U.S. foundation models trained on copyrighted data and calls for combined legal and technical safeguards to protect creators.

Key Takeaways

  • Foundation models—AI models trained on broad data at scale for a wide range of tasks—are often trained on large volumes of copyrighted material. Deploying these models can pose legal and ethical risks related to copyright.

  • Our review of U.S. fair use doctrine concludes that fair use is not guaranteed for foundation models as they can generate content that is not “transformative” enough compared to the copyrighted material. However, amid still evolving case law, the extent of copyright infringement risk and potency of a fair use defense remain uncertain.

  • To mitigate copyright risks, policymakers should consider making clarifications to fair use doctrine as it applies to AI training data while also encouraging good-faith technical mitigation strategies that align foundation models with fair use standards. Together, these strategies can maximize the benefits of foundation models while minimizing the moral, ethical, and legal harms of copyright violations.

  • In parallel, policymakers should investigate other policy mechanisms to ensure artists, authors, and creators are awarded fair compensation and credit, both those who do their work with the assistance of AI tools and those who do not use AI.

Executive Summary

Foundation models are often trained on large volumes of copyrighted material, including text on websites, images posted online, research papers, books, articles, and more. Deploying these models can post legal and ethical risks. Under U.S. law, copyright for a piece of creative work is assigned “the moment it is created and fixed in a tangible form that is perceptible either directly or with the aid of a machine or device.” Most data used to train foundation models falls under this definition. For example, the Pile, a massive open source language modeling dataset that has been used by Meta, Bloomberg, and others to train foundation models, contains a dataset of copyrighted, torrented e-books called Books3 that has become the focus of various ongoing lawsuits.

In the United States, AI researchers have long relied on fair use doctrine to avoid copyright issues with training data. The fair use doctrine allows members of the public to use copyrighted materials in certain instances, notably when the output is “transformative.” However, amid a class-action lawsuit against Microsoft, GitHub, and OpenAI for training systems on publicly published code without adequate credit; Getty Images suing Stable Diffusion AI tools for scraping its photos; and other significant AI-related legal actions, existing fair use interpretations are increasingly being challenged.

In our paper “Foundation Models and Fair Use,” we shed light on the urgency and uncertainty surrounding the copyright implications of foundation models. First, we reviewed relevant aspects of U.S. case law on fair use to identify the potential risks of foundation models developed using copyrighted content. We highlight that fair use is not guaranteed and that the risk of copyright infringement is real, though the exact extent remains uncertain. Second, we discussed four technical strategies to help reduce the risk of potential copyright violations, while underscoring the need for developing more techniques to ensure that foundation models behave in ways that are aligned with fair use.

We argue that the United States needs a two-pronged approach to addressing these copyright issues—a mix of legal and technical mitigations that would allow us to harness the positive impact while reducing intellectual property harms to creators. Fair use is not a panacea. Machine learning researchers, lawmakers, and other stakeholders need to understand both U.S. copyright law and technical mitigation measures that can help navigate the copyright questions of foundation models going forward.

Read Paper
Share
Link copied to clipboard!
Authors
  • Peter Henderson
    Peter Henderson
  • Xuechen Li
    Xuechen Li
  • Dan Jurafsky
    Dan Jurafsky
  • Tatsunori Hashimoto
    Tatsunori Hashimoto
  • Mark A. Lemley
    Mark A. Lemley
  • Percy Liang
    Percy Liang

Related Publications

Response to OSTP's Request for Information on Accelerating the American Scientific Enterprise
Rishi Bommasani, John Etchemendy, Surya Ganguli, Daniel E. Ho, Guido Imbens, James Landay, Fei-Fei Li, Russell Wald
Quick ReadDec 26, 2025
Response to Request

Stanford scholars respond to a federal RFI on scientific discovery, calling for the government to support a new “team science” academic research model for AI-enabled discovery.

Response to Request

Response to OSTP's Request for Information on Accelerating the American Scientific Enterprise

Rishi Bommasani, John Etchemendy, Surya Ganguli, Daniel E. Ho, Guido Imbens, James Landay, Fei-Fei Li, Russell Wald
Sciences (Social, Health, Biological, Physical)Regulation, Policy, GovernanceQuick ReadDec 26

Stanford scholars respond to a federal RFI on scientific discovery, calling for the government to support a new “team science” academic research model for AI-enabled discovery.

Beyond DeepSeek: China's Diverse Open-Weight AI Ecosystem and Its Policy Implications
Caroline Meinhardt, Sabina Nong, Graham Webster, Tatsunori Hashimoto, Christopher Manning
Deep DiveDec 16, 2025
Issue Brief

Almost one year after the “DeepSeek moment,” this brief analyzes China’s diverse open-model ecosystem and examines the policy implications of their widespread global diffusion.

Issue Brief

Beyond DeepSeek: China's Diverse Open-Weight AI Ecosystem and Its Policy Implications

Caroline Meinhardt, Sabina Nong, Graham Webster, Tatsunori Hashimoto, Christopher Manning
Foundation ModelsInternational Affairs, International Security, International DevelopmentDeep DiveDec 16

Almost one year after the “DeepSeek moment,” this brief analyzes China’s diverse open-model ecosystem and examines the policy implications of their widespread global diffusion.

Response to FDA's Request for Comment on AI-Enabled Medical Devices
Desmond C. Ong, Jared Moore, Nicole Martinez-Martin, Caroline Meinhardt, Eric Lin, William Agnew
Quick ReadDec 02, 2025
Response to Request

Stanford scholars respond to a federal RFC on evaluating AI-enabled medical devices, recommending policy interventions to help mitigate the harms of AI-powered chatbots used as therapists.

Response to Request

Response to FDA's Request for Comment on AI-Enabled Medical Devices

Desmond C. Ong, Jared Moore, Nicole Martinez-Martin, Caroline Meinhardt, Eric Lin, William Agnew
HealthcareRegulation, Policy, GovernanceQuick ReadDec 02

Stanford scholars respond to a federal RFC on evaluating AI-enabled medical devices, recommending policy interventions to help mitigate the harms of AI-powered chatbots used as therapists.

Russ Altman’s Testimony Before the U.S. Senate Committee on Health, Education, Labor, and Pensions
Russ Altman
Quick ReadOct 09, 2025
Testimony

In this testimony presented to the U.S. Senate Committee on Health, Education, Labor, and Pensions hearing titled “AI’s Potential to Support Patients, Workers, Children, and Families,” Russ Altman highlights opportunities for congressional support to make AI applications for patient care and drug discovery stronger, safer, and human-centered.

Testimony

Russ Altman’s Testimony Before the U.S. Senate Committee on Health, Education, Labor, and Pensions

Russ Altman
HealthcareRegulation, Policy, GovernanceSciences (Social, Health, Biological, Physical)Quick ReadOct 09

In this testimony presented to the U.S. Senate Committee on Health, Education, Labor, and Pensions hearing titled “AI’s Potential to Support Patients, Workers, Children, and Families,” Russ Altman highlights opportunities for congressional support to make AI applications for patient care and drug discovery stronger, safer, and human-centered.