What are AI Benchmarks?

Better Benchmarks for Safety-Critical AI Applications

Nikki Goth Itoi

May 27

news

Stanford researchers investigate why models often fail in edge-case scenarios.

Better Benchmarks for Safety-Critical AI Applications

Nikki Goth Itoi

May 27

Stanford researchers investigate why models often fail in edge-case scenarios.

Machine Learning

news

AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning

Madeleine Grunde-McLaughlin, Ranjay Krishna, Maneesh Agrawala

Dec 27

Research

AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning

Madeleine Grunde-McLaughlin, Ranjay Krishna, Maneesh Agrawala

Dec 27

AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning

Research

Extending the WILDS Benchmark for Unsupervised Adaptation

Shiori Sagawa, Pang Wei Koh, Tony Lee, Irena Gao, Sang Michael Xie, Kendrick Shen, Ananya Kumar, Weihua Hu, Michihiro Yasunaga, Henrik Marklund, Sara Beery, Etienne David, Ian Stavness, Wei Guo, Jure Leskovec, Kate Saenko, Tatsunori Hashimoto, Sergey Lev

Apr 24

Research

Extending the WILDS Benchmark for Unsupervised Adaptation

Shiori Sagawa, Pang Wei Koh, Tony Lee, Irena Gao, Sang Michael Xie, Kendrick Shen, Ananya Kumar, Weihua Hu, Michihiro Yasunaga, Henrik Marklund, Sara Beery, Etienne David, Ian Stavness, Wei Guo, Jure Leskovec, Kate Saenko, Tatsunori Hashimoto, Sergey Lev

Apr 24

Extending the WILDS Benchmark for Unsupervised Adaptation

Research

Sanmi Koyejo | Beyond Benchmarks: Building a Science of AI Measurement

seminarMar 19, 202512:00 PM - 1:15 PM

March

19

2025

The widepread deployment of AI systems in critical domains demands more rigorous approaches to evaluating their capabilities and safety.

March

19

2025

Sanmi Koyejo | Beyond Benchmarks: Building a Science of AI Measurement

Mar 19, 202512:00 PM - 1:15 PM

The widepread deployment of AI systems in critical domains demands more rigorous approaches to evaluating their capabilities and safety.

Sciences (Social, Health, Biological, Physical)

LegalBench: Prototyping a Collaborative Benchmark for Legal Reasoning

Neel Guha, Daniel E. Ho, Julian Nyarko

Sep 14

Research

LegalBench: Prototyping a Collaborative Benchmark for Legal Reasoning

Neel Guha, Daniel E. Ho, Julian Nyarko

Sep 14

LegalBench: Prototyping a Collaborative Benchmark for Legal Reasoning

Research

Stanford Develops Real-World Benchmarks for Healthcare AI Agents

Scott Hadly

Sep 15

news

Researchers are establishing standards to validate the efficacy of AI agents in clinical settings.

Stanford Develops Real-World Benchmarks for Healthcare AI Agents

Scott Hadly

Sep 15

Researchers are establishing standards to validate the efficacy of AI agents in clinical settings.

Healthcare

news

Stanford HAI Launches AI and Organizations Lab to Study Science of AI in the Workplace

Shana Lynch

May 13

News

The new center will examine AI's real-world impacts on jobs, teams, and organizational performance.

Stanford HAI Launches AI and Organizations Lab to Study Science of AI in the Workplace

Shana Lynch

May 13

The new center will examine AI's real-world impacts on jobs, teams, and organizational performance.

Industry, Innovation

Workforce, Labor

News

What are AI Benchmarks?

Navigate

Participate

Stay Up To Date

AI Benchmarks mentioned at Stanford HAI

Better Benchmarks for Safety-Critical AI Applications

Better Benchmarks for Safety-Critical AI Applications

AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning

AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning

Extending the WILDS Benchmark for Unsupervised Adaptation

Extending the WILDS Benchmark for Unsupervised Adaptation

Sanmi Koyejo | Beyond Benchmarks: Building a Science of AI Measurement

Sanmi Koyejo | Beyond Benchmarks: Building a Science of AI Measurement

LegalBench: Prototyping a Collaborative Benchmark for Legal Reasoning

LegalBench: Prototyping a Collaborative Benchmark for Legal Reasoning

Stanford Develops Real-World Benchmarks for Healthcare AI Agents

Stanford Develops Real-World Benchmarks for Healthcare AI Agents

Stanford HAI Launches AI and Organizations Lab to Study Science of AI in the Workplace

Stanford HAI Launches AI and Organizations Lab to Study Science of AI in the Workplace

Enroll in a Human-Centered AI Course