Stanford
University

Stanford Home
Maps & Directions
Search Stanford
Emergency Info

Terms of Use
Privacy
Copyright
Trademarks
Non-Discrimination
Accessibility

© Stanford University. Stanford, California 94305.

Sanmi Koyejo | Beyond Benchmarks: Building a Science of AI Measurement | Stanford HAI

Skip to content

About
Research
Education
Policy
AI Index

News
Events
Industry
Centers & Labs

Navigate

About
Events
Careers
Search

Participate

Get Involved
Support HAI
Contact Us

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Sanmi Koyejo | Beyond Benchmarks: Building a Science of AI Measurement

Status

Past

Date

Wednesday, March 19, 2025 12:00 PM - 1:15 PM PST/PDT

Location

Gates Computer Science Building Room 119

Topics

Sciences (Social, Health, Biological, Physical)

Attend Virtually

The widepread deployment of AI systems in critical domains demands more rigorous approaches to evaluating their capabilities and safety.

Share

Link copied to clipboard!

Event Contact

Annie Benisch

abenisch@stanford.edu

Related Events

AI+Science: Accelerating Discovery

ConferenceMay 05, 20268:30 AM - 5:00 PM

May

05

2026

AI+Science: Accelerating Discovery is an interdisciplinary conference bringing together researchers across physics, mathematics, chemistry, biology, neuroscience, and more to examine how AI is reshaping scientific discovery. Experts will separate hype from reality, spotlighting where AI is already enabling genuine breakthroughs and where its limits and risks remain.

Conference

AI+Science: Accelerating Discovery

May 05, 20268:30 AM - 5:00 PM

AI+Science: Accelerating Discovery is an interdisciplinary conference bringing together researchers across physics, mathematics, chemistry, biology, neuroscience, and more to examine how AI is reshaping scientific discovery. Experts will separate hype from reality, spotlighting where AI is already enabling genuine breakthroughs and where its limits and risks remain.

Caroline Meinhardt, Thomas Mullaney, Juan N. Pava, and Diyi Yang | How Can AI Support Language Digitization and Digital Inclusion?

SeminarApr 15, 202612:00 PM - 1:15 PM

April

15

2026

What does digital inclusion look like in the age of AI? Over 6,000 of the world’s 7,000-plus living languages remain digitally disadvantaged.

Seminar

Caroline Meinhardt, Thomas Mullaney, Juan N. Pava, and Diyi Yang | How Can AI Support Language Digitization and Digital Inclusion?

Apr 15, 202612:00 PM - 1:15 PM

What does digital inclusion look like in the age of AI? Over 6,000 of the world’s 7,000-plus living languages remain digitally disadvantaged.

While current evaluation practices rely on static benchmarks, these methods face fundamental efficiency, reliability, and real-world relevance challenges. This talk presents a path toward a measurement framework that bridges established psychometric principles with modern AI evaluation needs. We demonstrate how techniques from Item Response Theory, amortized computation, and predictability analysis can substantially improve the rigor and efficiency of AI evaluation. Through case studies in safety assessment and capability measurement, we show how this approach can enable more reliable, scalable, and meaningful evaluation of AI systems. This work points toward a broader vision: evolving AI evaluation from a collection of benchmarks into a rigorous measurement science that can effectively guide research, deployment, and policy decisions.

Speaker

Assistant Professor of Computer Science, Stanford University; Faculty Affiliate, Stanford HAI

Watch the Event Recording