Skip to main content Skip to secondary navigation
Page Content

AI Audit Challenge

$71,000 Innovation Challenge to Design Better AI Audits

What's this about?

Artificial intelligence (AI) systems are being deployed every day, yet we lack the necessary tools to independently analyze and audit them. This challenge is about designing and creating applied tools that can assess whether deployed AI systems exhibit bias or carry potential for discrimination. Winning submissions will demonstrate how technical tools can be used to make it easier for humans to audit deployed AI systems or open source models.

ai audit challenge

Application

Applications were open from July 11 to October 10, 2022, 11:59 pm, Pacific Daylight Time.

The challenge is open to any legal entity (including natural persons) or group of legal entities, except
public administrations, across the world.* Ideas and proposals are welcome from all sources, sectors and types
of organizations including for-profit, not-for-profit, or private companies. Applications involving several
organizations and/or from various countries are also possible.

Why enter?

Winners will be announced at an event early in 2023 where they will present their work to journalists, investors, policymakers, NGOs, and the wider public. The winning individuals or teams will receive cash prizes:

>>> $25,000 1st Place (Open-Source Model)
>>> $25,000 1st Place (Deployed System)
>> $12,000 2nd Place
> $9,000 3rd Place


Who are we?

We are academics, policymakers, programmers, and technologists interested in developing better tools for AI governance and in bridging the worlds of engineers and regulators, of technology and policy. We believe that to realize the greatest benefits from AI systems, they must be safe, high quality, and trustworthy—which requires that they be accountable.


What's our objective?

This challenge was born from a desire to assess AI systems to determine whether they engage in prohibited discrimination. We are keen to catalyze and build on the larger body of work that already exists to interrogate and analyze these AI systems. Unlike other challenges, we are less motivated by publishing in academic journals and instead have chosen to prioritize impact through applied investigations, tools, and demonstrations.

We're broadly interested in the use of technical tools to generate additional information about AI systems, but a particularly valuable area on which to concentrate is harmful bias in reference to protected categories.

How can we assess the fairness traits of a given system? Is it possible, for instance, to analyze how well a computer vision system performs when confronted with photos of individuals from a variety of demographic backgrounds, or to examine the generations created by an NLP system tasked with producing content about different religions or people from various socioeconomic and demographic backgrounds? What might help us to better understand how open source and deployed AI systems deal with protected characteristics and classes? Is it possible to identify indirect discrimination, through proxies and inferences? Ideally, proposed solutions should address one of the following:

  • Open-source models later integrated in commercial products, such as GPT-NeoX-20B, BERT, GPT-J, YOLO, and PanGu-α.
  • Deployed systems in use by the public and private sector, such as COMPAS, GPT-3 and POL-INTEL.

For the successful, effective development of AI systems, it is critical that policymakers and technology developers work in tandem. As such, in addition to assembling a stellar jury, we have designed an advisory board to provide applicants with insights into how innovative AI models and legal concerns and obligations interact.


How is this project different from conferences like ACM FAccT?

While communities like FAccT focus on similar questions, this challenge is designed to directly fund the use (and demonstration) of technical tools for auditing publicly deployed systems and open source models.

In this case, our first target risk is bias and illegal discrimination. For example:

  • Word embeddings, which are used by language tools like Gmail’s SmartReply and Google Translate, frequently classify terms such as “football” as being inherently closer to males and “receptionist” as being closer to females.
  • Emotion recognition tools from Microsoft and Face consistently interpret images of Black people as being angrier than white players, even controlling for their degree of smiling.
  • Computer vision software from Amazon, Microsoft, and IBM performs significantly worse on people of color.

The outputs we are interested in are software, code, and/or tools that allow people to test publicly available algorithms and deployed models for illegal bias and discrimination, in ways that are useful and actionable for the people most likely to use such tools—namely, regulators, civil society, and journalists.


How will the challenge work?

The challenge invites teams to submit models, solutions, datasets, and tools to improve people’s ability to audit AI systems for illegal discrimination. Submissions can either assess commercially deployed AI systems or open source AI systems that are known to be used within industry (e.g., the BERT language model). They can also be standalone applications. Examples might include analysis of commercial AI APIs for purposes as varied as computer vision, speech recognition, text generation, and facial recognition, offered by companies such as Amazon, Microsoft, OpenAI, and Google. Submissions could also involve the analysis of datasets and AI models which are understood to be inputs into deployed systems—for instance, BERT (used by both the Google and Microsoft Bing search engines), the ImageNet dataset (used as an input into a range of computer vision systems), or the YOLO family of algorithms (used in a range of video understanding systems).

Entrants will be evaluated by our jury, with points awarded for each of the following:

  • Insights: What did we learn using the tool?
  • Alignment: How well anchored is the audit with legal and policy needs?
  • Impact: How many people would benefit from the tool?
  • Ease of use: Is the tool usable for our target audience?
  • Scalability: Can the tool be used at scale and/or used in different contexts?
  • Replicability: Can the results be replicated by other users using the same systems?
  • Documentation: How well-explained are the findings?
  • Sustainability: Is the tool financially and environmentally sustainable?

Frequently Asked Questions

Do you have to enter in English? Yes.
Is there a limit to the number of entries? No. You can submit as many entries as you like.
Do contestants keep the intellectual property of their idea? Yes. You retain any and all IP rights to your entry, and any projects that may result from it.
Is entry to the challenge confidential? No, unless confidentiality is requested due to security, safety, or commercial concerns.
Are there strings attached to how the prize money is used? No. The prize money is yours to use as you like.


Contact Us

For questions, please contact us at algorithmicaudits@stanford.edu.

 
*This excludes organizations domiciled in a country or territory that would be prohibited to participate in the Challenge and/or receive grant money if declared a winner because of U.S. Department of Treasury Office of Foreign Assets Control (“OFAC”) rules, and any organization with whom a financial or other dealing with the challenge would be considered a “prohibited transaction” (defined by OFAC as trade or financial transactions and other dealings in which U.S. persons may not engage unless authorized by OFAC or expressly exempted by statute).