Foundation Models | Stanford HAI

Interventions on model-internal states are fundamental operations in many areas of AI, including model editing, steering, robustness, and interpretability. To facilitate such research, we introduce pyvene, an open-source Python library that supports customizable interventions on a range of different PyTorch modules. pyvene supports complex intervention schemes with an intuitive configuration format, and its interventions can be static or include trainable parameters. We show how pyvene provides a unified and extensible framework for performing interventions on neural models and sharing the intervened upon models with others. We illustrate the power of the library via interpretability analyses using causal abstraction and knowledge localization. We publish our library through Python Package Index (PyPI) and provide code, documentation, and tutorials at ‘https://github.com/stanfordnlp/pyvene‘.

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

Zhengxuan Wu, Atticus Geiger, Jing Huang, Noah Goodman, Christopher Potts, Aryaman Arora, Zheng Wang

Jun 01, 2024

Interventions on model-internal states are fundamental operations in many areas of AI, including model editing, steering, robustness, and interpretability. To facilitate such research, we introduce pyvene, an open-source Python library that supports customizable interventions on a range of different PyTorch modules. pyvene supports complex intervention schemes with an intuitive configuration format, and its interventions can be static or include trainable parameters. We show how pyvene provides a unified and extensible framework for performing interventions on neural models and sharing the intervened upon models with others. We illustrate the power of the library via interpretability analyses using causal abstraction and knowledge localization. We publish our library through Python Package Index (PyPI) and provide code, documentation, and tutorials at ‘https://github.com/stanfordnlp/pyvene‘.

Natural Language Processing

Generative AI

Machine Learning

Foundation Models

Research

What Makes a Good AI Benchmark?

Anka Reuel, Amelia Hardy, Chandler Smith, Max Lamparth, Malcolm Hardy, Mykel Kochenderfer

Quick ReadDec 11, 2024

Policy Brief

This brief presents a novel assessment framework for evaluating the quality of AI benchmarks and scores 24 benchmarks against the framework.

What Makes a Good AI Benchmark?

Anka Reuel, Amelia Hardy, Chandler Smith, Max Lamparth, Malcolm Hardy, Mykel Kochenderfer

Quick ReadDec 11, 2024

This brief presents a novel assessment framework for evaluating the quality of AI benchmarks and scores 24 benchmarks against the framework.

Foundation Models

Privacy, Safety, Security

Policy Brief

Squashing ‘Fantastic Bugs’: Researchers Look to Fix Flaws in AI Benchmarks

Andrew Myers

Dec 08, 2025

News

In evaluating thousands of benchmarks that AI developers use to assess the quality of their new models, a team of Stanford researchers says 5% could have serious flaws that can lead to major ramifications.

Squashing ‘Fantastic Bugs’: Researchers Look to Fix Flaws in AI Benchmarks

Andrew Myers

Dec 08, 2025

In evaluating thousands of benchmarks that AI developers use to assess the quality of their new models, a team of Stanford researchers says 5% could have serious flaws that can lead to major ramifications.

Foundation Models

Generative AI

News

A Large Scale RCT on Effective Error Messages in CS1

Sierra Wang, John Mitchell, Christopher Piech

Mar 07, 2024

Research

In this paper, we evaluate the most effective error message types through a large-scale randomized controlled trial conducted in an open-access, online introductory computer science course with 8,762 students from 146 countries. We assess existing error message enhancement strategies, as well as two novel approaches of our own: (1) generating error messages using OpenAI's GPT in real time and (2) constructing error messages that incorporate the course discussion forum. By examining students' direct responses to error messages, and their behavior throughout the course, we quantitatively evaluate the immediate and longer term efficacy of different error message types. We find that students using GPT generated error messages repeat an error 23.1% less often in the subsequent attempt, and resolve an error in 34.8% fewer additional attempts, compared to students using standard error messages. We also perform an analysis across various demographics to understand any disparities in the impact of different error message types. Our results find no significant difference in the effectiveness of GPT generated error messages for students from varying socioeconomic and demographic backgrounds. Our findings underscore GPT generated error messages as the most helpful error message type, especially as a universally effective intervention across demographics.

A Large Scale RCT on Effective Error Messages in CS1

Sierra Wang, John Mitchell, Christopher Piech

Mar 07, 2024

In this paper, we evaluate the most effective error message types through a large-scale randomized controlled trial conducted in an open-access, online introductory computer science course with 8,762 students from 146 countries. We assess existing error message enhancement strategies, as well as two novel approaches of our own: (1) generating error messages using OpenAI's GPT in real time and (2) constructing error messages that incorporate the course discussion forum. By examining students' direct responses to error messages, and their behavior throughout the course, we quantitatively evaluate the immediate and longer term efficacy of different error message types. We find that students using GPT generated error messages repeat an error 23.1% less often in the subsequent attempt, and resolve an error in 34.8% fewer additional attempts, compared to students using standard error messages. We also perform an analysis across various demographics to understand any disparities in the impact of different error message types. Our results find no significant difference in the effectiveness of GPT generated error messages for students from varying socioeconomic and demographic backgrounds. Our findings underscore GPT generated error messages as the most helpful error message type, especially as a universally effective intervention across demographics.

Natural Language Processing

Foundation Models

Generative AI

Research

Response to U.S. AI Safety Institute’s Request for Comment on Managing Misuse Risk For Dual-Use Foundation Models

Rishi Bommasani, Alexander Wan, Yifan Mai, Percy Liang, Daniel E. Ho

Sep 09, 2024

Response to Request

Stanford scholars respond to a federal RFC on the U.S. AI Safety Institute’s draft guidelines for managing the misuse risk for dual-use foundation models.

Response to U.S. AI Safety Institute’s Request for Comment on Managing Misuse Risk For Dual-Use Foundation Models

Rishi Bommasani, Alexander Wan, Yifan Mai, Percy Liang, Daniel E. Ho

Sep 09, 2024

Stanford scholars respond to a federal RFC on the U.S. AI Safety Institute’s draft guidelines for managing the misuse risk for dual-use foundation models.

Regulation, Policy, Governance

Foundation Models

Privacy, Safety, Security

Response to Request

Stay Up To Date

Navigate

Participate

All Work Published on Foundation Models

Stanford Research Teams Receive New Hoffman-Yee Grant Funding for 2025

Stanford Research Teams Receive New Hoffman-Yee Grant Funding for 2025

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

pyvene: A Library for Understanding and Improving PyTorch Models via Interventions

What Makes a Good AI Benchmark?

What Makes a Good AI Benchmark?

Squashing ‘Fantastic Bugs’: Researchers Look to Fix Flaws in AI Benchmarks

Squashing ‘Fantastic Bugs’: Researchers Look to Fix Flaws in AI Benchmarks

A Large Scale RCT on Effective Error Messages in CS1

A Large Scale RCT on Effective Error Messages in CS1

Response to U.S. AI Safety Institute’s Request for Comment on Managing Misuse Risk For Dual-Use Foundation Models

Response to U.S. AI Safety Institute’s Request for Comment on Managing Misuse Risk For Dual-Use Foundation Models