A Framework to Report AI’s Flaws

Pointing to "white-hat" hacking, AI policy experts recommend a new system of third-party reporting and tracking of AI’s flaws.
As the users of today’s artificial intelligence tools know, AI often hallucinates — offering erroneous information as fact — or reveals issues in its data or training. But even the best-intentioned user can find it difficult to notify AI developers of flaws in their models and harder still to get them addressed.
Now, a team of computer scientists and policy experts at Stanford University, Massachusetts Institute of Technology, and a dozen other institutions is proposing a new way for these outside third-party users to report flaws and track whether and how AI developers address them.
In the paper “In-House Evaluation Is Not Enough: Towards Robust Third-Party Flaw Disclosure for General-Purpose AI,” published on preprint server arXiv, the research team proposes a broad framework for responsible third-party discovery and disclosure of AI’s flaws — and for developers to report their efforts to address them.
“We’re at a moment where these AI systems are being deployed to hundreds of millions of people at a time. But the infrastructure to identify and fix flaws at AI companies lags far behind other fields, like cybersecurity and software development,” says co-first author Shayne Longpre, a PhD student at MIT.
White-hat Culture
In those more-mature fields, Longpre says, there is a robust culture of reporting and remediation of flaws — even so-called “bug bounties” in which well-intentioned external parties are paid to find and report flaws.
So far, such a collaborative culture has not yet developed in the AI field. Companies have to date relied on in-house teams or pre-approved contractors to identify flaws, but they have proved inadequate to surface the breadth and complexity of real-world risks.
Flaws in AI are different from the security gaps of cybersecurity or bugs in the software industry. They can range from the aforementioned hallucinations to problems in the data, such as racial bias in medical imaging. Often these flaws can only be discovered once models are live and in use by millions of users. Who better to find and surface these flaws than these neutral third-parties who only want AI that works, Longpre asks rhetorically.
Three Recommendations
Against that backdrop, the scholars propose a three-part recommendation.
First, they offer a standardized AI flaw report template (see below) governed by a set of good-faith guidelines. In this regard, the authors draw inspiration from cybersecurity’s hacking culture that encourages well-meaning parties to actively search for flaws while adhering to clear rules of engagement, including doing no harm to users, safeguarding privacy, and reporting responsibly.

A flaw report card contains common elements of disclosure from software security, used to improve reproducibility of flaws and triage among them.
Second, the authors encourage companies to offer legal protections for research. Many AI companies dissuade outsiders from probing, reverse engineering, or testing their models with legal restrictions. The authors call for legal and technical safe harbors to protect good-faith evaluators from lawsuits or punitive account bans. Such frameworks work well in high-stakes cybersecurity settings, says co-first author Ruth Appel, a Stanford Impact Labs postdoctoral fellow at Stanford.
Third, the authors call for a “Disclosure Coordination Center,” a sort of public clearinghouse of known flaws and developers’ efforts to address them. This is perhaps the most ambitious proposal among the three. Flaws in AI are often “transferable,” Appel explains. That is, flaws in one system can also be found in others trained on similar data or with similar architectures. A Disclosure Coordination Center would standardize and streamline communication across the AI industry and provide a certain measure of public accountability.
“Our goal is to make flaw reporting more systematic and coordinated, which will help users and model developers alike,” says senior author Percy Liang, associate professor of computer science.
Early Adopters
Under the current framework, which is really more of a stopgap process that developed organically, flaws are either emailed discreetly to the developer never to be seen again or, conversely, blasted on social media in what amounts to public shaming. Neither approach ensures the problem gets fixed, Longpre says.
“There was a time when software companies didn’t want to hear about security bugs either,” Longpre adds. “But we learned that sunlight is the best disinfectant.”
Appel is optimistic about adoption. The team will be building a prototype website for submitting standardized flaw reports and is in talks with partners about piloting the Disclosure Coordination Center concept.
“We need companies to change their policies, for researchers to start using these reporting tools, and for the ecosystem to invest in shared processes and infrastructure,” says Longpre. “This can’t just be a framework on paper — it has to become common and accepted practice.”