Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
Safeguarding Third-Party AI Research | Stanford HAI

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
policyPolicy Brief

Safeguarding Third-Party AI Research

Date
February 13, 2025
Topics
Privacy, Safety, Security
Regulation, Policy, Governance
Safeguarding third-party AI research
Read Paper
abstract

This brief examines the barriers to independent AI evaluation and proposes safe harbors to protect good-faith third-party research.

Key Takeaways

  • Third-party AI research is essential to ensure that AI companies do not grade their own homework, but few companies actively protect or promote such research.

  • We found no major foundation model developers currently offer comprehensive protections for third-party evaluation. Instead, their policies often disincentivize it. 

  • A safe harbor for good-faith research should be a top priority for policymakers. It enables good-faith research and increases the scale, diversity, and independence of evaluations. 

Executive Summary

Third-party evaluation is a cornerstone of efforts to reduce the substantial risks posed by AI systems. AI is a vast field with thousands of highly specialized experts around the world who can help stress-test the most powerful systems. But few companies empower these researchers to test their AI systems, for fear of exposing flaws in their products. AI companies often block safety research with restrictive terms of service or by suspending researchers who report flaws.

In our paper, “A Safe Harbor for AI Evaluation and Red Teaming,” we assess the policies and practices of seven top developers of generative AI systems, finding that none offers comprehensive protections for third-party AI research. Unlike with cybersecurity, generative AI is a new field without well-established norms regarding flaw disclosure, safety standards, or mechanisms for conducting third-party research. We propose that developers adopt safe harbors to enable good-faith, adversarial testing of AI systems.

Introduction

Generative AI systems pose a wide range of potential risks, from enabling the creation of nonconsensual intimate imagery to facilitating the development of malware. Evaluating generative AI systems is crucial to understanding the technology, ensuring public accountability, and reducing these risks.

In July 2023, many prominent AI companies signed voluntary commitments at the White House, pledging to “incent third-party discovery and reporting of issues and vulnerabilities.” More than a year later, implementation of this commitment has been uneven. While some companies do reward researchers for finding security flaws in their AI systems, few companies strongly encourage research on safety or provide concrete protections for good-faith research practices. Instead, leading generative AI companies’ terms of service legally prohibit third-party safety and trustworthiness research, in effect threatening anyone who conducts such research with bans from their platforms or even legal action. For example, companies’ policies do not allow researchers to jailbreak AI systems like ChatGPT, Claude, or Gemini to assess potential threats to U.S. national security.

In March 2024, we penned an open letter signed by over 350 leading AI researchers and advocates calling for a safe harbor for third-party AI evaluation. The researchers noted that while security research on traditional software is protected by voluntary company protections (safe harbors), established vulnerability disclosure norms, and legal safeguards from the Department of Justice, AI safety and trustworthiness research lacks comparable protections.

Companies have continued to be opaque about key aspects of their most powerful AI systems, such as the data used to build their models. Developers of generative AI models tout the safety of their systems based on internal red teaming, but there is no way for the government or independent researchers to validate these results, as companies do not release reproducible evaluations.

Generative AI companies also impose barriers on their platforms that limit good-faith research. Similar issues plague social media: Companies have taken steps to prevent researchers and journalists from conducting investigations on their platforms that, together with federal legislation, have had a chilling effect on such research and worsened the spread of harmful content online. But conducting research on generative AI systems comes with additional challenges, as the content on generative AI platforms is not publicly available. Users need accounts to access AI-generated content, which can be restricted by the company that owns the platform. Many AI companies also block certain user requests and limit the functionality of their models to prevent researchers from unearthing issues related to safety or trustworthiness. The stakes are also higher for AI, which has the potential not only to turbocharge misinformation but also to provide U.S. adversaries like China and Russia with material strategic advantages.

To assess the state of independent evaluation for generative AI, our team of machine learning, law, and policy experts conducted a thorough review of seven major AI companies’ policies, access provisions, and related enforcement processes. We detail our experiences with evaluation of AI systems and potential barriers other third-party evaluators may face, and propose alternative practices and policies to enable broader community participation in AI evaluation.

Read Paper
Share
Link copied to clipboard!
Authors
  • Kevin Klyman
    Kevin Klyman
  • Shayne Longpre
    Shayne Longpre
  • Sayash Kapoor
    Sayash Kapoor
  • Rishi Bommasani
    Rishi Bommasani
  • Percy Liang
    Percy Liang
  • Peter Henderson
    Peter Henderson

Related Publications

Response to OSTP’s Request for Information on the Development of an AI Action Plan
Caroline Meinhardt, Daniel Zhang, Rishi Bommasani, Jennifer King, Russell Wald, Percy Liang, Daniel E. Ho
Mar 17, 2025
Response to Request

In this response to a request for information issued by the National Science Foundation’s Networking and Information Technology Research and Development National Coordination Office (on behalf of the Office of Science and Technology Policy), scholars from Stanford HAI, CRFM, and RegLab urge policymakers to prioritize four areas of policy action in their AI Action Plan: 1) Promote open innovation as a strategic advantage for U.S. competitiveness; 2) Maintain U.S. AI leadership by promoting scientific innovation; 3) Craft evidence-based AI policy that protects Americans without stifling innovation; 4) Empower government leaders with resources and technical expertise to ensure a “whole-of-government” approach to AI governance.

Response to Request

Response to OSTP’s Request for Information on the Development of an AI Action Plan

Caroline Meinhardt, Daniel Zhang, Rishi Bommasani, Jennifer King, Russell Wald, Percy Liang, Daniel E. Ho
Regulation, Policy, GovernanceMar 17

In this response to a request for information issued by the National Science Foundation’s Networking and Information Technology Research and Development National Coordination Office (on behalf of the Office of Science and Technology Policy), scholars from Stanford HAI, CRFM, and RegLab urge policymakers to prioritize four areas of policy action in their AI Action Plan: 1) Promote open innovation as a strategic advantage for U.S. competitiveness; 2) Maintain U.S. AI leadership by promoting scientific innovation; 3) Craft evidence-based AI policy that protects Americans without stifling innovation; 4) Empower government leaders with resources and technical expertise to ensure a “whole-of-government” approach to AI governance.

Assessing the Implementation of Federal AI Leadership and Compliance Mandates
Jennifer Wang, Mirac Suzgun, Caroline Meinhardt, Daniel Zhang, Kazia Nowacki, Daniel E. Ho
Jan 17, 2025
White Paper

This white paper assesses federal efforts to advance leadership on AI innovation and governance through recent executive actions and emphasizes the need for senior-level leadership to achieve a whole-of-government approach.

White Paper

Assessing the Implementation of Federal AI Leadership and Compliance Mandates

Jennifer Wang, Mirac Suzgun, Caroline Meinhardt, Daniel Zhang, Kazia Nowacki, Daniel E. Ho
Government, Public AdministrationRegulation, Policy, GovernanceJan 17

This white paper assesses federal efforts to advance leadership on AI innovation and governance through recent executive actions and emphasizes the need for senior-level leadership to achieve a whole-of-government approach.

What Makes a Good AI Benchmark?
Anka Reuel, Amelia Hardy, Chandler Smith, Max Lamparth, Malcolm Hardy, Mykel Kochenderfer
Dec 11, 2024
Policy Brief
What Makes a Good AI Benchmark

This brief presents a novel assessment framework for evaluating the quality of AI benchmarks and scores 24 benchmarks against the framework.

Policy Brief
What Makes a Good AI Benchmark

What Makes a Good AI Benchmark?

Anka Reuel, Amelia Hardy, Chandler Smith, Max Lamparth, Malcolm Hardy, Mykel Kochenderfer
Foundation ModelsPrivacy, Safety, SecurityDec 11

This brief presents a novel assessment framework for evaluating the quality of AI benchmarks and scores 24 benchmarks against the framework.

Response to U.S. AI Safety Institute’s Request for Comment on Managing Misuse Risk For Dual-Use Foundation Models
Rishi Bommasani, Alexander Wan, Yifan Mai, Percy Liang, Daniel E. Ho
Sep 09, 2024
Response to Request

In this response to the U.S. AI Safety Institute’s (US AISI) request for comment on its draft guidelines for managing the misuse risk for dual-use foundation models, scholars from Stanford HAI, the Center for Research on Foundation Models (CRFM), and the Regulation, Evaluation, and Governance Lab (RegLab) urge the US AISI to strengthen its guidance on reproducible evaluations and third- party evaluations, as well as clarify guidance on post-deployment monitoring. They also encourage the institute to develop similar guidance for other actors in the foundation model supply chain and for non-misuse risks, while ensuring the continued open release of foundation models absent evidence of marginal risk.

Response to Request

Response to U.S. AI Safety Institute’s Request for Comment on Managing Misuse Risk For Dual-Use Foundation Models

Rishi Bommasani, Alexander Wan, Yifan Mai, Percy Liang, Daniel E. Ho
Regulation, Policy, GovernanceFoundation ModelsPrivacy, Safety, SecuritySep 09

In this response to the U.S. AI Safety Institute’s (US AISI) request for comment on its draft guidelines for managing the misuse risk for dual-use foundation models, scholars from Stanford HAI, the Center for Research on Foundation Models (CRFM), and the Regulation, Evaluation, and Governance Lab (RegLab) urge the US AISI to strengthen its guidance on reproducible evaluations and third- party evaluations, as well as clarify guidance on post-deployment monitoring. They also encourage the institute to develop similar guidance for other actors in the foundation model supply chain and for non-misuse risks, while ensuring the continued open release of foundation models absent evidence of marginal risk.