Safeguarding Third-Party AI Research

This brief examines the barriers to independent AI evaluation and proposes safe harbors to protect good-faith third-party research.
Key Takeaways
Third-party AI research is essential to ensure that AI companies do not grade their own homework, but few companies actively protect or promote such research.
We found no major foundation model developers currently offer comprehensive protections for third-party evaluation. Instead, their policies often disincentivize it.
A safe harbor for good-faith research should be a top priority for policymakers. It enables good-faith research and increases the scale, diversity, and independence of evaluations.
Executive Summary
Third-party evaluation is a cornerstone of efforts to reduce the substantial risks posed by AI systems. AI is a vast field with thousands of highly specialized experts around the world who can help stress-test the most powerful systems. But few companies empower these researchers to test their AI systems, for fear of exposing flaws in their products. AI companies often block safety research with restrictive terms of service or by suspending researchers who report flaws.
In our paper, “A Safe Harbor for AI Evaluation and Red Teaming,” we assess the policies and practices of seven top developers of generative AI systems, finding that none offers comprehensive protections for third-party AI research. Unlike with cybersecurity, generative AI is a new field without well-established norms regarding flaw disclosure, safety standards, or mechanisms for conducting third-party research. We propose that developers adopt safe harbors to enable good-faith, adversarial testing of AI systems.
Introduction
Generative AI systems pose a wide range of potential risks, from enabling the creation of nonconsensual intimate imagery to facilitating the development of malware. Evaluating generative AI systems is crucial to understanding the technology, ensuring public accountability, and reducing these risks.
In July 2023, many prominent AI companies signed voluntary commitments at the White House, pledging to “incent third-party discovery and reporting of issues and vulnerabilities.” More than a year later, implementation of this commitment has been uneven. While some companies do reward researchers for finding security flaws in their AI systems, few companies strongly encourage research on safety or provide concrete protections for good-faith research practices. Instead, leading generative AI companies’ terms of service legally prohibit third-party safety and trustworthiness research, in effect threatening anyone who conducts such research with bans from their platforms or even legal action. For example, companies’ policies do not allow researchers to jailbreak AI systems like ChatGPT, Claude, or Gemini to assess potential threats to U.S. national security.
In March 2024, we penned an open letter signed by over 350 leading AI researchers and advocates calling for a safe harbor for third-party AI evaluation. The researchers noted that while security research on traditional software is protected by voluntary company protections (safe harbors), established vulnerability disclosure norms, and legal safeguards from the Department of Justice, AI safety and trustworthiness research lacks comparable protections.
Companies have continued to be opaque about key aspects of their most powerful AI systems, such as the data used to build their models. Developers of generative AI models tout the safety of their systems based on internal red teaming, but there is no way for the government or independent researchers to validate these results, as companies do not release reproducible evaluations.
Generative AI companies also impose barriers on their platforms that limit good-faith research. Similar issues plague social media: Companies have taken steps to prevent researchers and journalists from conducting investigations on their platforms that, together with federal legislation, have had a chilling effect on such research and worsened the spread of harmful content online. But conducting research on generative AI systems comes with additional challenges, as the content on generative AI platforms is not publicly available. Users need accounts to access AI-generated content, which can be restricted by the company that owns the platform. Many AI companies also block certain user requests and limit the functionality of their models to prevent researchers from unearthing issues related to safety or trustworthiness. The stakes are also higher for AI, which has the potential not only to turbocharge misinformation but also to provide U.S. adversaries like China and Russia with material strategic advantages.
To assess the state of independent evaluation for generative AI, our team of machine learning, law, and policy experts conducted a thorough review of seven major AI companies’ policies, access provisions, and related enforcement processes. We detail our experiences with evaluation of AI systems and potential barriers other third-party evaluators may face, and propose alternative practices and policies to enable broader community participation in AI evaluation.