Safeguarding Third-Party AI Research

Date

February 13, 2025

Topics

abstract

This brief examines the barriers to independent AI evaluation and proposes safe harbors to protect good-faith third-party research.

Key Takeaways

Third-party AI research is essential to ensure that AI companies do not grade their own homework, but few companies actively protect or promote such research.
We found no major foundation model developers currently offer comprehensive protections for third-party evaluation. Instead, their policies often disincentivize it.
A safe harbor for good-faith research should be a top priority for policymakers. It enables good-faith research and increases the scale, diversity, and independence of evaluations.

Executive Summary

Third-party evaluation is a cornerstone of efforts to reduce the substantial risks posed by AI systems. AI is a vast field with thousands of highly specialized experts around the world who can help stress-test the most powerful systems. But few companies empower these researchers to test their AI systems, for fear of exposing flaws in their products. AI companies often block safety research with restrictive terms of service or by suspending researchers who report flaws.

In our paper, “A Safe Harbor for AI Evaluation and Red Teaming,” we assess the policies and practices of seven top developers of generative AI systems, finding that none offers comprehensive protections for third-party AI research. Unlike with cybersecurity, generative AI is a new field without well-established norms regarding flaw disclosure, safety standards, or mechanisms for conducting third-party research. We propose that developers adopt safe harbors to enable good-faith, adversarial testing of AI systems.

Introduction

Generative AI systems pose a wide range of potential risks, from enabling the creation of nonconsensual intimate imagery to facilitating the development of malware. Evaluating generative AI systems is crucial to understanding the technology, ensuring public accountability, and reducing these risks.

In July 2023, many prominent AI companies signed voluntary commitments at the White House, pledging to “incent third-party discovery and reporting of issues and vulnerabilities.” More than a year later, implementation of this commitment has been uneven. While some companies do reward researchers for finding security flaws in their AI systems, few companies strongly encourage research on safety or provide concrete protections for good-faith research practices. Instead, leading generative AI companies’ terms of service legally prohibit third-party safety and trustworthiness research, in effect threatening anyone who conducts such research with bans from their platforms or even legal action. For example, companies’ policies do not allow researchers to jailbreak AI systems like ChatGPT, Claude, or Gemini to assess potential threats to U.S. national security.

In March 2024, we penned an open letter signed by over 350 leading AI researchers and advocates calling for a safe harbor for third-party AI evaluation. The researchers noted that while security research on traditional software is protected by voluntary company protections (safe harbors), established vulnerability disclosure norms, and legal safeguards from the Department of Justice, AI safety and trustworthiness research lacks comparable protections.

Companies have continued to be opaque about key aspects of their most powerful AI systems, such as the data used to build their models. Developers of generative AI models tout the safety of their systems based on internal red teaming, but there is no way for the government or independent researchers to validate these results, as companies do not release reproducible evaluations.

Generative AI companies also impose barriers on their platforms that limit good-faith research. Similar issues plague social media: Companies have taken steps to prevent researchers and journalists from conducting investigations on their platforms that, together with federal legislation, have had a chilling effect on such research and worsened the spread of harmful content online. But conducting research on generative AI systems comes with additional challenges, as the content on generative AI platforms is not publicly available. Users need accounts to access AI-generated content, which can be restricted by the company that owns the platform. Many AI companies also block certain user requests and limit the functionality of their models to prevent researchers from unearthing issues related to safety or trustworthiness. The stakes are also higher for AI, which has the potential not only to turbocharge misinformation but also to provide U.S. adversaries like China and Russia with material strategic advantages.

To assess the state of independent evaluation for generative AI, our team of machine learning, law, and policy experts conducted a thorough review of seven major AI companies’ policies, access provisions, and related enforcement processes. We detail our experiences with evaluation of AI systems and potential barriers other third-party evaluators may face, and propose alternative practices and policies to enable broader community participation in AI evaluation.

Related Publications

Assessing the Implementation of Federal AI Leadership and Compliance Mandates

Jennifer Wang, Mirac Suzgun, Caroline Meinhardt, Daniel Zhang, Kazia Nowacki, Daniel E. Ho

Jan 17, 2025

Whitepaper

This white paper assesses federal efforts to advance leadership on AI innovation and governance through recent executive actions and emphasizes the need for senior-level leadership to achieve a whole-of-government approach.

Whitepaper

Assessing the Implementation of Federal AI Leadership and Compliance Mandates

Jennifer Wang, Mirac Suzgun, Caroline Meinhardt, Daniel Zhang, Kazia Nowacki, Daniel E. Ho

Government, Public AdministrationRegulation, Policy, GovernanceJan 17

What Makes a Good AI Benchmark?

Anka Reuel, Amelia Hardy, Chandler Smith, Max Lamparth, Malcolm Hardy, Mykel Kochenderfer

Dec 11, 2024

Policy Brief

This brief presents a novel assessment framework for evaluating the quality of AI benchmarks and scores 24 benchmarks against the framework.

Policy Brief

What Makes a Good AI Benchmark?

Anka Reuel, Amelia Hardy, Chandler Smith, Max Lamparth, Malcolm Hardy, Mykel Kochenderfer

Foundation ModelsPrivacy, Safety, SecurityDec 11

This brief presents a novel assessment framework for evaluating the quality of AI benchmarks and scores 24 benchmarks against the framework.

Response to U.S. AI Safety Institute’s Request for Comment on Managing Misuse Risk For Dual-Use Foundation Models

Rishi Bommasani, Alexander Wan, Yifan Mai, Percy Liang, Daniel E. Ho

Sep 09, 2024

Response to Request

Stanford HAI aggrees with and supports the U.S. AI Safety Institute’s (US AISI) draft guidelines for improving the safety, security, and trustworthiness of dual-use foundation models.

Response to Request

Response to U.S. AI Safety Institute’s Request for Comment on Managing Misuse Risk For Dual-Use Foundation Models

Rishi Bommasani, Alexander Wan, Yifan Mai, Percy Liang, Daniel E. Ho

Regulation, Policy, GovernanceSep 09

Stanford HAI aggrees with and supports the U.S. AI Safety Institute’s (US AISI) draft guidelines for improving the safety, security, and trustworthiness of dual-use foundation models.

Transparency of AI EO Implementation: An Assessment 90 Days In

Caroline Meinhardt, Kevin Klyman, Hamzah Daud, Christie M. Lawrence, Rohini Kosoglu, Daniel Zhang, Daniel E. Ho

Feb 22, 2024

Explainer

The U.S. government has made swift progress and broadened transparency, but that momentum needs to be maintained for the next looming deadlines.

Explainer

Transparency of AI EO Implementation: An Assessment 90 Days In

Caroline Meinhardt, Kevin Klyman, Hamzah Daud, Christie M. Lawrence, Rohini Kosoglu, Daniel Zhang, Daniel E. Ho

Regulation, Policy, GovernanceFeb 22

The U.S. government has made swift progress and broadened transparency, but that momentum needs to be maintained for the next looming deadlines.

policyPolicy Brief

Safeguarding Third-Party AI Research

Date

February 13, 2025

Topics

Privacy, Safety, Security

Regulation, Policy, Governance

Read Paper

abstract

This brief examines the barriers to independent AI evaluation and proposes safe harbors to protect good-faith third-party research.

Key Takeaways

Third-party AI research is essential to ensure that AI companies do not grade their own homework, but few companies actively protect or promote such research.
We found no major foundation model developers currently offer comprehensive protections for third-party evaluation. Instead, their policies often disincentivize it.
A safe harbor for good-faith research should be a top priority for policymakers. It enables good-faith research and increases the scale, diversity, and independence of evaluations.

Executive Summary

Introduction

Related Publications

Assessing the Implementation of Federal AI Leadership and Compliance Mandates

Jennifer Wang, Mirac Suzgun, Caroline Meinhardt, Daniel Zhang, Kazia Nowacki, Daniel E. Ho

Jan 17, 2025

Whitepaper

Whitepaper

Assessing the Implementation of Federal AI Leadership and Compliance Mandates

Jennifer Wang, Mirac Suzgun, Caroline Meinhardt, Daniel Zhang, Kazia Nowacki, Daniel E. Ho

Government, Public AdministrationRegulation, Policy, GovernanceJan 17

What Makes a Good AI Benchmark?

Anka Reuel, Amelia Hardy, Chandler Smith, Max Lamparth, Malcolm Hardy, Mykel Kochenderfer

Dec 11, 2024

Policy Brief

This brief presents a novel assessment framework for evaluating the quality of AI benchmarks and scores 24 benchmarks against the framework.

Policy Brief

What Makes a Good AI Benchmark?

Anka Reuel, Amelia Hardy, Chandler Smith, Max Lamparth, Malcolm Hardy, Mykel Kochenderfer

Foundation ModelsPrivacy, Safety, SecurityDec 11

This brief presents a novel assessment framework for evaluating the quality of AI benchmarks and scores 24 benchmarks against the framework.

Response to U.S. AI Safety Institute’s Request for Comment on Managing Misuse Risk For Dual-Use Foundation Models

Rishi Bommasani, Alexander Wan, Yifan Mai, Percy Liang, Daniel E. Ho

Sep 09, 2024

Response to Request

Stanford HAI aggrees with and supports the U.S. AI Safety Institute’s (US AISI) draft guidelines for improving the safety, security, and trustworthiness of dual-use foundation models.

Response to Request

Response to U.S. AI Safety Institute’s Request for Comment on Managing Misuse Risk For Dual-Use Foundation Models

Rishi Bommasani, Alexander Wan, Yifan Mai, Percy Liang, Daniel E. Ho

Regulation, Policy, GovernanceSep 09

Stanford HAI aggrees with and supports the U.S. AI Safety Institute’s (US AISI) draft guidelines for improving the safety, security, and trustworthiness of dual-use foundation models.

Transparency of AI EO Implementation: An Assessment 90 Days In

Caroline Meinhardt, Kevin Klyman, Hamzah Daud, Christie M. Lawrence, Rohini Kosoglu, Daniel Zhang, Daniel E. Ho

Feb 22, 2024

Explainer

The U.S. government has made swift progress and broadened transparency, but that momentum needs to be maintained for the next looming deadlines.

Explainer

Transparency of AI EO Implementation: An Assessment 90 Days In

Caroline Meinhardt, Kevin Klyman, Hamzah Daud, Christie M. Lawrence, Rohini Kosoglu, Daniel Zhang, Daniel E. Ho

Regulation, Policy, GovernanceFeb 22

The U.S. government has made swift progress and broadened transparency, but that momentum needs to be maintained for the next looming deadlines.

Safeguarding Third-Party AI Research

Key Takeaways

Executive Summary

Introduction

Kevin Klyman

Shayne Longpre

Sayash Kapoor

Rishi Bommasani

Percy Liang

Peter Henderson

Related Publications

Assessing the Implementation of Federal AI Leadership and Compliance Mandates

Assessing the Implementation of Federal AI Leadership and Compliance Mandates

What Makes a Good AI Benchmark?

What Makes a Good AI Benchmark?

Response to U.S. AI Safety Institute’s Request for Comment on Managing Misuse Risk For Dual-Use Foundation Models

Response to U.S. AI Safety Institute’s Request for Comment on Managing Misuse Risk For Dual-Use Foundation Models

Transparency of AI EO Implementation: An Assessment 90 Days In

Transparency of AI EO Implementation: An Assessment 90 Days In

Safeguarding Third-Party AI Research

Key Takeaways

Executive Summary

Introduction

Kevin Klyman

Shayne Longpre

Sayash Kapoor

Rishi Bommasani

Percy Liang

Peter Henderson

Related Publications

Assessing the Implementation of Federal AI Leadership and Compliance Mandates

Assessing the Implementation of Federal AI Leadership and Compliance Mandates

What Makes a Good AI Benchmark?

What Makes a Good AI Benchmark?

Response to U.S. AI Safety Institute’s Request for Comment on Managing Misuse Risk For Dual-Use Foundation Models

Response to U.S. AI Safety Institute’s Request for Comment on Managing Misuse Risk For Dual-Use Foundation Models

Transparency of AI EO Implementation: An Assessment 90 Days In

Transparency of AI EO Implementation: An Assessment 90 Days In