Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
Natural Language Processing | Stanford HAI

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
    • Subscribe to Email
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
Back to Natural Language Processing

All Work Published on Natural Language Processing

A Large Scale RCT on Effective Error Messages in CS1
Sierra Wang, John Mitchell, Christopher Piech
Mar 07, 2024
Research

In this paper, we evaluate the most effective error message types through a large-scale randomized controlled trial conducted in an open-access, online introductory computer science course with 8,762 students from 146 countries. We assess existing error message enhancement strategies, as well as two novel approaches of our own: (1) generating error messages using OpenAI's GPT in real time and (2) constructing error messages that incorporate the course discussion forum. By examining students' direct responses to error messages, and their behavior throughout the course, we quantitatively evaluate the immediate and longer term efficacy of different error message types. We find that students using GPT generated error messages repeat an error 23.1% less often in the subsequent attempt, and resolve an error in 34.8% fewer additional attempts, compared to students using standard error messages. We also perform an analysis across various demographics to understand any disparities in the impact of different error message types. Our results find no significant difference in the effectiveness of GPT generated error messages for students from varying socioeconomic and demographic backgrounds. Our findings underscore GPT generated error messages as the most helpful error message type, especially as a universally effective intervention across demographics.

A Large Scale RCT on Effective Error Messages in CS1

Sierra Wang, John Mitchell, Christopher Piech
Mar 07, 2024

In this paper, we evaluate the most effective error message types through a large-scale randomized controlled trial conducted in an open-access, online introductory computer science course with 8,762 students from 146 countries. We assess existing error message enhancement strategies, as well as two novel approaches of our own: (1) generating error messages using OpenAI's GPT in real time and (2) constructing error messages that incorporate the course discussion forum. By examining students' direct responses to error messages, and their behavior throughout the course, we quantitatively evaluate the immediate and longer term efficacy of different error message types. We find that students using GPT generated error messages repeat an error 23.1% less often in the subsequent attempt, and resolve an error in 34.8% fewer additional attempts, compared to students using standard error messages. We also perform an analysis across various demographics to understand any disparities in the impact of different error message types. Our results find no significant difference in the effectiveness of GPT generated error messages for students from varying socioeconomic and demographic backgrounds. Our findings underscore GPT generated error messages as the most helpful error message type, especially as a universally effective intervention across demographics.

Natural Language Processing
Foundation Models
Generative AI
Research
AI’s Fairness Problem: When Treating Everyone the Same is the Wrong Approach
Angelina Wang, Michelle Phan, Daniel E. Ho, Sanmi Koyejo
Feb 06, 2025
News

Current generative AI models struggle to recognize when demographic distinctions matter—leading to inaccurate, misleading, and sometimes harmful outcomes.

AI’s Fairness Problem: When Treating Everyone the Same is the Wrong Approach

Angelina Wang, Michelle Phan, Daniel E. Ho, Sanmi Koyejo
Feb 06, 2025

Current generative AI models struggle to recognize when demographic distinctions matter—leading to inaccurate, misleading, and sometimes harmful outcomes.

Machine Learning
Natural Language Processing
News
A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition
Tyler Benster, Guy Wilson, Reshef Elisha, Francis R. Willett, Shaul Druckmann
Mar 02, 2024
Research
Your browser does not support the video tag.

Silent Speech Interfaces (SSIs) offer a nonin- vasive alternative to brain-computer interfaces for soundless verbal communication. We in- troduce Multimodal Orofacial Neural Audio (MONA), a system that leverages cross-modal alignment through novel loss functions—cross- contrast (crossCon) and supervised temporal con- trast (supTcon)—to train a multimodal model with a shared latent representation. This archi- tecture enables the use of audio-only datasets like LibriSpeech to improve silent speech recog- nition. Additionally, our introduction of Large Language Model (LLM) Integrated Scoring Ad- justment (LISA) significantly improves recogni- tion accuracy. Together, MONA LISA reduces the state-of-the-art word error rate (WER) from 28.8% to 12.2% in the Gaddy (2020) benchmark dataset for silent speech on an open vocabulary. For vocal EMG recordings, our method improves the state-of-the-art from 23.3% to 3.7% WER. In the Brain-to-Text 2024 competition, LISA per- forms best, improving the top WER from 9.8% to 8.9%. To the best of our knowledge, this work represents the first instance where noninvasive silent speech recognition on an open vocabulary has cleared the threshold of 15% WER, demon- strating that SSIs can be a viable alternative to au- tomatic speech recognition (ASR). Our work not only narrows the performance gap between silent and vocalized speech but also opens new possi- bilities in human-computer interaction, demon- strating the potential of cross-modal approaches in noisy and data-limited regimes.

A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition

Tyler Benster, Guy Wilson, Reshef Elisha, Francis R. Willett, Shaul Druckmann
Mar 02, 2024

Silent Speech Interfaces (SSIs) offer a nonin- vasive alternative to brain-computer interfaces for soundless verbal communication. We in- troduce Multimodal Orofacial Neural Audio (MONA), a system that leverages cross-modal alignment through novel loss functions—cross- contrast (crossCon) and supervised temporal con- trast (supTcon)—to train a multimodal model with a shared latent representation. This archi- tecture enables the use of audio-only datasets like LibriSpeech to improve silent speech recog- nition. Additionally, our introduction of Large Language Model (LLM) Integrated Scoring Ad- justment (LISA) significantly improves recogni- tion accuracy. Together, MONA LISA reduces the state-of-the-art word error rate (WER) from 28.8% to 12.2% in the Gaddy (2020) benchmark dataset for silent speech on an open vocabulary. For vocal EMG recordings, our method improves the state-of-the-art from 23.3% to 3.7% WER. In the Brain-to-Text 2024 competition, LISA per- forms best, improving the top WER from 9.8% to 8.9%. To the best of our knowledge, this work represents the first instance where noninvasive silent speech recognition on an open vocabulary has cleared the threshold of 15% WER, demon- strating that SSIs can be a viable alternative to au- tomatic speech recognition (ASR). Our work not only narrows the performance gap between silent and vocalized speech but also opens new possi- bilities in human-computer interaction, demon- strating the potential of cross-modal approaches in noisy and data-limited regimes.

Natural Language Processing
Machine Learning
Foundation Models
Your browser does not support the video tag.
Research
Large Language Models Just Want To Be Liked
Jan 13, 2025
News

When LLMs take surveys on personality traits, they, like people, exhibit a desire to appear likable. 

Large Language Models Just Want To Be Liked

Jan 13, 2025

When LLMs take surveys on personality traits, they, like people, exhibit a desire to appear likable. 

Natural Language Processing
Foundation Models
Generative AI
News
How Persuasive Is AI-generated Propaganda?
Josh A. Goldstein, Jason Chao, Shelby Grossman, Alex Stamos, Michael Tomz
Feb 20, 2024
Research

Can large language models, a form of artificial intelligence (AI), generate persuasive propaganda? We conducted a preregistered survey experiment of US respondents to investigate the persuasiveness of news articles written by foreign propagandists compared to content generated by GPT-3 davinci (a large language model). We found that GPT-3 can create highly persuasive text as measured by participants’ agreement with propaganda theses. We further investigated whether a person fluent in English could improve propaganda persuasiveness. Editing the prompt fed to GPT-3 and/or curating GPT-3’s output made GPT-3 even more persuasive, and, under certain conditions, as persuasive as the original propaganda. Our findings suggest that propagandists could use AI to create convincing content with limited effort.

How Persuasive Is AI-generated Propaganda?

Josh A. Goldstein, Jason Chao, Shelby Grossman, Alex Stamos, Michael Tomz
Feb 20, 2024

Can large language models, a form of artificial intelligence (AI), generate persuasive propaganda? We conducted a preregistered survey experiment of US respondents to investigate the persuasiveness of news articles written by foreign propagandists compared to content generated by GPT-3 davinci (a large language model). We found that GPT-3 can create highly persuasive text as measured by participants’ agreement with propaganda theses. We further investigated whether a person fluent in English could improve propaganda persuasiveness. Editing the prompt fed to GPT-3 and/or curating GPT-3’s output made GPT-3 even more persuasive, and, under certain conditions, as persuasive as the original propaganda. Our findings suggest that propagandists could use AI to create convincing content with limited effort.

Natural Language Processing
Foundation Models
Generative AI
Research
Can AI Hold Consistent Values? Stanford Researchers Probe LLM Consistency and Bias
Andrew Myers
Nov 11, 2024
News

New research tests large language models for consistency across diverse topics, revealing that while they handle neutral topics reliably, controversial issues lead to varied answers.

Can AI Hold Consistent Values? Stanford Researchers Probe LLM Consistency and Bias

Andrew Myers
Nov 11, 2024

New research tests large language models for consistency across diverse topics, revealing that while they handle neutral topics reliably, controversial issues lead to varied answers.

Ethics, Equity, Inclusion
Natural Language Processing
Privacy, Safety, Security
News
1
2
3
4
5