Over the past decade, researchers have warned of the flaws and vulnerabilities inherent to certain AI systems, such as the difficulty in evaluating their safety, legality and effectiveness, and their potential discriminatory effects. Increasingly, we see these harms materializing and affecting people in real life settings: Last year, a New Jersey man was wrongly accused of shoplifting and trying to hit an officer with a car because of a wrong facial recognition match. A drug addiction risk algorithm was recently found to have a disparate impact on women. Graduate researchers discovered that facial recognition systems deployed by the private sector displayed significant biases. More recently, researchers have also focused their attention on how large-scale language models often capture undesirable societal biases.
Unfortunately, it remains very difficult for regulators, journalists, policymakers and the third sector more widely to evaluate these algorithms and test them to understand potential discriminatory impacts. As the Wired article linked above notes, the proprietary nature of many deployed algorithms means that “there’s no way to look under the hood to inspect them for errors or biases.” In 2020, a group of respected academics and practitioners published a paper calling for better tools to audit algorithmic systems, noting the process of AI development is opaque and too many barriers make it impossible for third parties to verify the claims made by developers.
Challenge and Cash Prizes
That’s why Stanford’s Cyber Policy Center and the Stanford Institute for Human-Centered Artificial Intelligence have launched a challenge with prizes of up to $25,000 to encourage developers to create better and more usable approaches to auditing AI systems.
Visit the AI Audit Challenge Website
This challenge is generously funded by the Rockefeller Foundation and will focus on tools to assess whether deployed AI systems illegally discriminate against protected categories. For example, can we analyze how well a computer vision system performs when confronted with pictures of people from different demographic backgrounds? Can we grade the output of a natural language processing system asked to produce content on different religions?
Why Auditing Matters
Being able to audit AI systems comes with great benefits: It allows public officials or journalists to verify the statements made by companies about the efficacy of their algorithms, thereby reducing the risk of fraud and misrepresentation. It improves competition on the quality and accuracy of AI systems. It could also allow governments to establish high-level objectives without being overly prescriptive about the means to get there. Being able to detect and evaluate the potential harm caused by various algorithmic applications is crucial to the democratic governance of AI systems.
The problem is that auditing algorithms is very difficult. AI systems are not simply a few lines of code, but complex sociotechnical systems consisting of a mixture of technical choices and social practices. Context matters greatly and what is acceptable in one setting might not necessarily be in another – for example, an algorithm used in a medical or social welfare setting will require far more scrutiny than an algorithm used to generate music. But even the core technical parts, such as the algorithm, the compute and the training sets, remain very difficult to properly scrutinize – even more so when a proprietary AI system is deployed. Datasets used in machine learning are also frequently incomplete and not representative of different population groups, but looking under the hood is, in practice, quasi-impossible.
We believe there is an urgent need for better tools to test these algorithms. The risk of harmful algorithmic systems is well known; now is the time to act and build the toolkits that will empower policymakers, activists and white hat hackers of the future. It is with this backdrop in mind that we decided to launch an AI Audit Challenge, with an initial focus on tools to detect bias and discrimination in particular.
The Challenge Process
We invite submissions looking at both open-source models later integrated in commercial products (such as BERT and YOLO) and deployed systems in use by the public and private sector (such as COMPAS, GPT-3 and POL-INTEL) to better understand how these systems deal with protected characteristics and classes and identify indirect discrimination.
Submissions will be evaluated by a jury including Mozilla fellow Deborah Raji, Montreal AI Ethics Institute founder Abhishek Gupta and DeepMind senior research scientist William Isaac. Participants will also have the opportunity to iterate their work through workshops and receive advice and support from an advisory board, with members such as Professor Safiya Noble of UCLA and former U.S. Ambassador to the United Nations Eileen Donahoe. Two first place winners will receive $25,000, with additional awards for second and third.
We believe that to get the most out of AI systems, they need to first and foremost respect civil rights law, and also be safe, high quality and trustworthy. With this challenge, we hope to catalyze and build on the larger body of work concerned with interrogating these systems to create pragmatic policy, regulatory and governance approaches.
Learn more and submit your entry at the AI Audit Challenge Website.
Stanford HAI’s mission is to advance AI research, education, policy and practice to improve the human condition. Learn more.