Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
Sanmi Koyejo | Beyond Benchmarks: Building a Science of AI Measurement | Stanford HAI
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
    • Subscribe to Email
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Your browser does not support the video tag.
eventSeminar

Sanmi Koyejo | Beyond Benchmarks: Building a Science of AI Measurement

Status
Past
Date
Wednesday, March 19, 2025 12:00 PM - 1:15 PM PST/PDT
Location
Gates Computer Science Building Room 119
Topics
Sciences (Social, Health, Biological, Physical)
Attend Virtually

The widepread deployment of AI systems in critical domains demands more rigorous approaches to evaluating their capabilities and safety.

While current evaluation practices rely on static benchmarks, these methods face fundamental efficiency, reliability, and real-world relevance challenges. This talk presents a path toward a measurement framework that bridges established psychometric principles with modern AI evaluation needs. We demonstrate how techniques from Item Response Theory, amortized computation, and predictability analysis can substantially improve the rigor and efficiency of AI evaluation. Through case studies in safety assessment and capability measurement, we show how this approach can enable more reliable, scalable, and meaningful evaluation of AI systems. This work points toward a broader vision: evolving AI evaluation from a collection of benchmarks into a rigorous measurement science that can effectively guide research, deployment, and policy decisions.

Speaker
Sanmi Koyejo
Assistant Professor of Computer Science, Stanford University; Faculty Affiliate, Stanford HAI

Watch the Event Recording

Share
Link copied to clipboard!
Event Contact
Annie Benisch
abenisch@stanford.edu
More from HAI and SDS seminars
  • Hari Subramonyam | Learning by Creating: A Human-Centered Vision for AI in Education
    SeminarMar 11, 202612:00 PM - 1:15 PM
    March
    11
    2026

Related Events

AI+Science: Accelerating Discovery
ConferenceMay 05, 20268:30 AM - 5:00 PM
May
05
2026

AI+Science: Accelerating Discovery is an interdisciplinary conference bringing together researchers across physics, mathematics, chemistry, biology, neuroscience, and more to examine how AI is reshaping scientific discovery. Experts will separate hype from reality, spotlighting where AI is already enabling genuine breakthroughs and where its limits and risks remain.

Conference

AI+Science: Accelerating Discovery

May 05, 20268:30 AM - 5:00 PM

AI+Science: Accelerating Discovery is an interdisciplinary conference bringing together researchers across physics, mathematics, chemistry, biology, neuroscience, and more to examine how AI is reshaping scientific discovery. Experts will separate hype from reality, spotlighting where AI is already enabling genuine breakthroughs and where its limits and risks remain.

Zoë Hitzig | How People Use ChatGPT
Mar 09, 202612:00 PM - 1:00 PM
March
09
2026

Despite the rapid adoption of LLM chatbots, little is known about how they are used. We approach this question theoretically and empirically, modeling a user who chooses whether to complete a task herself, ask the chatbot for information that reduces decision noise, or delegate execution to the chatbot...

Event

Zoë Hitzig | How People Use ChatGPT

Mar 09, 202612:00 PM - 1:00 PM

Despite the rapid adoption of LLM chatbots, little is known about how they are used. We approach this question theoretically and empirically, modeling a user who chooses whether to complete a task herself, ask the chatbot for information that reduces decision noise, or delegate execution to the chatbot...

Hari Subramonyam | Learning by Creating: A Human-Centered Vision for AI in Education
SeminarMar 11, 202612:00 PM - 1:15 PM
March
11
2026
Seminar

Hari Subramonyam | Learning by Creating: A Human-Centered Vision for AI in Education

Mar 11, 202612:00 PM - 1:15 PM