Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
A Large Scale RCT on Effective Error Messages in CS1 | Stanford HAI

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
    • Subscribe to Email
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
research

A Large Scale RCT on Effective Error Messages in CS1

Date
March 07, 2024
Topics
Natural Language Processing
Foundation Models
Generative AI
Read Paper
abstract

In this paper, we evaluate the most effective error message types through a large-scale randomized controlled trial conducted in an open-access, online introductory computer science course with 8,762 students from 146 countries. We assess existing error message enhancement strategies, as well as two novel approaches of our own: (1) generating error messages using OpenAI's GPT in real time and (2) constructing error messages that incorporate the course discussion forum. By examining students' direct responses to error messages, and their behavior throughout the course, we quantitatively evaluate the immediate and longer term efficacy of different error message types. We find that students using GPT generated error messages repeat an error 23.1% less often in the subsequent attempt, and resolve an error in 34.8% fewer additional attempts, compared to students using standard error messages. We also perform an analysis across various demographics to understand any disparities in the impact of different error message types. Our results find no significant difference in the effectiveness of GPT generated error messages for students from varying socioeconomic and demographic backgrounds. Our findings underscore GPT generated error messages as the most helpful error message type, especially as a universally effective intervention across demographics.

Share
Link copied to clipboard!
Authors
  • Sierra Wang
  • John Mitchell
    John Mitchell
  • Chris headshot
    Christopher Piech
Related
  • Closed
    HAI and Accelerator for Learning Partnership Grant

    Created to explore how generative AI can be applied in novel ways to support learning

Related Publications

Stories for the Future 2024
Isabelle Levent
Deep DiveMar 31, 2025
Research

We invited 11 sci-fi filmmakers and AI researchers to Stanford for Stories for the Future, a day-and-a-half experiment in fostering new narratives about AI. Researchers shared perspectives on AI and filmmakers reflected on the challenges of writing AI narratives. Together researcher-writer pairs transformed a research paper into a written scene. The challenge? Each scene had to include an AI manifestation, but could not be about the personhood of AI or AI as a threat. Read the results of this project.

Research

Stories for the Future 2024

Isabelle Levent
Machine LearningGenerative AIArts, HumanitiesCommunications, MediaDesign, Human-Computer InteractionSciences (Social, Health, Biological, Physical)Deep DiveMar 31

We invited 11 sci-fi filmmakers and AI researchers to Stanford for Stories for the Future, a day-and-a-half experiment in fostering new narratives about AI. Researchers shared perspectives on AI and filmmakers reflected on the challenges of writing AI narratives. Together researcher-writer pairs transformed a research paper into a written scene. The challenge? Each scene had to include an AI manifestation, but could not be about the personhood of AI or AI as a threat. Read the results of this project.

The Promise and Perils of Artificial Intelligence in Advancing Participatory Science and Health Equity in Public Health
Abby C King, Zakaria N Doueiri, Ankita Kaulberg, Lisa Goldman Rosas
Feb 14, 2025
Research
Your browser does not support the video tag.

Current societal trends reflect an increased mistrust in science and a lowered civic engagement that threaten to impair research that is foundational for ensuring public health and advancing health equity. One effective countermeasure to these trends lies in community-facing citizen science applications to increase public participation in scientific research, making this field an important target for artificial intelligence (AI) exploration. We highlight potentially promising citizen science AI applications that extend beyond individual use to the community level, including conversational large language models, text-to-image generative AI tools, descriptive analytics for analyzing integrated macro- and micro-level data, and predictive analytics. The novel adaptations of AI technologies for community-engaged participatory research also bring an array of potential risks. We highlight possible negative externalities and mitigations for some of the potential ethical and societal challenges in this field.

Research
Your browser does not support the video tag.

The Promise and Perils of Artificial Intelligence in Advancing Participatory Science and Health Equity in Public Health

Abby C King, Zakaria N Doueiri, Ankita Kaulberg, Lisa Goldman Rosas
Foundation ModelsGenerative AIMachine LearningNatural Language ProcessingSciences (Social, Health, Biological, Physical)HealthcareFeb 14

Current societal trends reflect an increased mistrust in science and a lowered civic engagement that threaten to impair research that is foundational for ensuring public health and advancing health equity. One effective countermeasure to these trends lies in community-facing citizen science applications to increase public participation in scientific research, making this field an important target for artificial intelligence (AI) exploration. We highlight potentially promising citizen science AI applications that extend beyond individual use to the community level, including conversational large language models, text-to-image generative AI tools, descriptive analytics for analyzing integrated macro- and micro-level data, and predictive analytics. The novel adaptations of AI technologies for community-engaged participatory research also bring an array of potential risks. We highlight possible negative externalities and mitigations for some of the potential ethical and societal challenges in this field.

Policy-Shaped Prediction: Avoiding Distractions in Model-Based Reinforcement Learning
Nicholas Haber, Miles Huston, Isaac Kauvar
Dec 13, 2024
Research
Your browser does not support the video tag.

Model-based reinforcement learning (MBRL) is a promising route to sampleefficient policy optimization. However, a known vulnerability of reconstructionbased MBRL consists of scenarios in which detailed aspects of the world are highly predictable, but irrelevant to learning a good policy. Such scenarios can lead the model to exhaust its capacity on meaningless content, at the cost of neglecting important environment dynamics. While existing approaches attempt to solve this problem, we highlight its continuing impact on leading MBRL methods —including DreamerV3 and DreamerPro — with a novel environment where background distractions are intricate, predictable, and useless for planning future actions. To address this challenge we develop a method for focusing the capacity of the world model through synergy of a pretrained segmentation model, a task-aware reconstruction loss, and adversarial learning. Our method outperforms a variety of other approaches designed to reduce the impact of distractors, and is an advance towards robust model-based reinforcement learning.

Research
Your browser does not support the video tag.

Policy-Shaped Prediction: Avoiding Distractions in Model-Based Reinforcement Learning

Nicholas Haber, Miles Huston, Isaac Kauvar
Machine LearningFoundation ModelsDec 13

Model-based reinforcement learning (MBRL) is a promising route to sampleefficient policy optimization. However, a known vulnerability of reconstructionbased MBRL consists of scenarios in which detailed aspects of the world are highly predictable, but irrelevant to learning a good policy. Such scenarios can lead the model to exhaust its capacity on meaningless content, at the cost of neglecting important environment dynamics. While existing approaches attempt to solve this problem, we highlight its continuing impact on leading MBRL methods —including DreamerV3 and DreamerPro — with a novel environment where background distractions are intricate, predictable, and useless for planning future actions. To address this challenge we develop a method for focusing the capacity of the world model through synergy of a pretrained segmentation model, a task-aware reconstruction loss, and adversarial learning. Our method outperforms a variety of other approaches designed to reduce the impact of distractors, and is an advance towards robust model-based reinforcement learning.

LABOR-LLM: Language-Based Occupational Representations with Large Language Models
Susan Athey, Herman Brunborg, Tianyu Du, Ayush Kanodia, Keyon Vafa
Dec 11, 2024
Research
Your browser does not support the video tag.

Vafa et al. (2024) introduced a transformer-based econometric model, CAREER, that predicts a worker’s next job as a function of career history (an “occupation model”). CAREER was initially estimated (“pre-trained”) using a large, unrepresentative resume dataset, which served as a “foundation model,” and parameter estimation was continued (“fine-tuned”) using data from a representative survey. CAREER had better predictive performance than benchmarks. This paper considers an alternative where the resume-based foundation model is replaced by a large language model (LLM). We convert tabular data from the survey into text files that resemble resumes and fine-tune the LLMs using these text files with the objective to predict the next token (word). The resulting fine-tuned LLM is used as an input to an occupation model. Its predictive performance surpasses all prior models. We demonstrate the value of fine-tuning and further show that by adding more career data from a different population, fine-tuning smaller LLMs surpasses the performance of fine-tuning larger models.

Research
Your browser does not support the video tag.

LABOR-LLM: Language-Based Occupational Representations with Large Language Models

Susan Athey, Herman Brunborg, Tianyu Du, Ayush Kanodia, Keyon Vafa
Foundation ModelsNatural Language ProcessingDec 11

Vafa et al. (2024) introduced a transformer-based econometric model, CAREER, that predicts a worker’s next job as a function of career history (an “occupation model”). CAREER was initially estimated (“pre-trained”) using a large, unrepresentative resume dataset, which served as a “foundation model,” and parameter estimation was continued (“fine-tuned”) using data from a representative survey. CAREER had better predictive performance than benchmarks. This paper considers an alternative where the resume-based foundation model is replaced by a large language model (LLM). We convert tabular data from the survey into text files that resemble resumes and fine-tune the LLMs using these text files with the objective to predict the next token (word). The resulting fine-tuned LLM is used as an input to an occupation model. Its predictive performance surpasses all prior models. We demonstrate the value of fine-tuning and further show that by adding more career data from a different population, fine-tuning smaller LLMs surpasses the performance of fine-tuning larger models.