What is a Large Language Model (LLM)?

Toward Responsible Development and Evaluation of LLMs in Psychotherapy

Elizabeth C. Stade, Shannon Wiltsey Stirman, Lyle Ungar, Cody L. Boland, H. Andrew Schwartz, David B. Yaden, João Sedoc, Robert J. DeRubeis, Robb Willer, Jane P. Kim, Johannes Eichstaedt

Quick ReadJun 13

policy brief

This brief reviews the current landscape of LLMs developed for psychotherapy and proposes a framework for evaluating the readiness of these AI tools for clinical deployment.

Toward Responsible Development and Evaluation of LLMs in Psychotherapy

Elizabeth C. Stade, Shannon Wiltsey Stirman, Lyle Ungar, Cody L. Boland, H. Andrew Schwartz, David B. Yaden, João Sedoc, Robert J. DeRubeis, Robb Willer, Jane P. Kim, Johannes Eichstaedt

Quick ReadJun 13

This brief reviews the current landscape of LLMs developed for psychotherapy and proposes a framework for evaluating the readiness of these AI tools for clinical deployment.

Healthcare

policy brief

Stanford debuts first AI benchmark to help understand LLMs

Sharon Goldman

Nov 17

media mention

HAI’s Center for Research on Foundation Models launches Holistic Evaluation of Language Models (HELM), the first benchmarking project aimed at improving the transparency of language models and the broader category of foundation models.

Stanford debuts first AI benchmark to help understand LLMs

Sharon Goldman

Nov 17

HAI’s Center for Research on Foundation Models launches Holistic Evaluation of Language Models (HELM), the first benchmarking project aimed at improving the transparency of language models and the broader category of foundation models.

media mention

Mind the (Language) Gap: Mapping the Challenges of LLM Development in Low-Resource Language Contexts

Juan N. Pava, Caroline Meinhardt, Haifa Badi Uz Zaman, Toni Friedman, Sang T. Truong, Daniel Zhang, Elena Cryst, Vukosi Marivate, Sanmi Koyejo

Deep DiveApr 22

whitepaper

This white paper maps the LLM development landscape for low-resource languages, highlighting challenges, trade-offs, and strategies to increase investment; prioritize cross-disciplinary, community-driven development; and ensure fair data ownership.

Mind the (Language) Gap: Mapping the Challenges of LLM Development in Low-Resource Language Contexts

Juan N. Pava, Caroline Meinhardt, Haifa Badi Uz Zaman, Toni Friedman, Sang T. Truong, Daniel Zhang, Elena Cryst, Vukosi Marivate, Sanmi Koyejo

Deep DiveApr 22

This white paper maps the LLM development landscape for low-resource languages, highlighting challenges, trade-offs, and strategies to increase investment; prioritize cross-disciplinary, community-driven development; and ensure fair data ownership.

International Affairs, International Security, International Development

Natural Language Processing

Ethics, Equity, Inclusion

whitepaper

Escalation Risks from LLMs in Military and Diplomatic Contexts

Juan-Pablo Rivera, Gabriel Mukobi, Anka Reuel, Max Lamparth, Chandler Smith, Jacquelyn Schneider

Quick ReadMay 02

policy brief

This brief presents the results of a wargame simulation that aims to evaluate the escalation risks of large language models (LLMs) in high-stakes military and diplomatic decision-making.

Escalation Risks from LLMs in Military and Diplomatic Contexts

Juan-Pablo Rivera, Gabriel Mukobi, Anka Reuel, Max Lamparth, Chandler Smith, Jacquelyn Schneider

Quick ReadMay 02

This brief presents the results of a wargame simulation that aims to evaluate the escalation risks of large language models (LLMs) in high-stakes military and diplomatic decision-making.

International Affairs, International Security, International Development

policy brief

Can AI Hold Consistent Values? Stanford Researchers Probe LLM Consistency and Bias

Andrew Myers

Nov 11

news

New research tests large language models for consistency across diverse topics, revealing that while they handle neutral topics reliably, controversial issues lead to varied answers.

Can AI Hold Consistent Values? Stanford Researchers Probe LLM Consistency and Bias

Andrew Myers

Nov 11

New research tests large language models for consistency across diverse topics, revealing that while they handle neutral topics reliably, controversial issues lead to varied answers.

Ethics, Equity, Inclusion

Natural Language Processing

Privacy, Safety, Security

news

A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition

Tyler Benster, Guy Wilson, Reshef Elisha, Francis R. Willett, Shaul Druckmann

Mar 02

Research

Silent Speech Interfaces (SSIs) offer a nonin- vasive alternative to brain-computer interfaces for soundless verbal communication. We in- troduce Multimodal Orofacial Neural Audio (MONA), a system that leverages cross-modal alignment through novel loss functions—cross- contrast (crossCon) and supervised temporal con- trast (supTcon)—to train a multimodal model with a shared latent representation. This archi- tecture enables the use of audio-only datasets like LibriSpeech to improve silent speech recog- nition. Additionally, our introduction of Large Language Model (LLM) Integrated Scoring Ad- justment (LISA) significantly improves recogni- tion accuracy. Together, MONA LISA reduces the state-of-the-art word error rate (WER) from 28.8% to 12.2% in the Gaddy (2020) benchmark dataset for silent speech on an open vocabulary. For vocal EMG recordings, our method improves the state-of-the-art from 23.3% to 3.7% WER. In the Brain-to-Text 2024 competition, LISA per- forms best, improving the top WER from 9.8% to 8.9%. To the best of our knowledge, this work represents the first instance where noninvasive silent speech recognition on an open vocabulary has cleared the threshold of 15% WER, demon- strating that SSIs can be a viable alternative to au- tomatic speech recognition (ASR). Our work not only narrows the performance gap between silent and vocalized speech but also opens new possi- bilities in human-computer interaction, demon- strating the potential of cross-modal approaches in noisy and data-limited regimes.

A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition

Tyler Benster, Guy Wilson, Reshef Elisha, Francis R. Willett, Shaul Druckmann

Mar 02

Silent Speech Interfaces (SSIs) offer a nonin- vasive alternative to brain-computer interfaces for soundless verbal communication. We in- troduce Multimodal Orofacial Neural Audio (MONA), a system that leverages cross-modal alignment through novel loss functions—cross- contrast (crossCon) and supervised temporal con- trast (supTcon)—to train a multimodal model with a shared latent representation. This archi- tecture enables the use of audio-only datasets like LibriSpeech to improve silent speech recog- nition. Additionally, our introduction of Large Language Model (LLM) Integrated Scoring Ad- justment (LISA) significantly improves recogni- tion accuracy. Together, MONA LISA reduces the state-of-the-art word error rate (WER) from 28.8% to 12.2% in the Gaddy (2020) benchmark dataset for silent speech on an open vocabulary. For vocal EMG recordings, our method improves the state-of-the-art from 23.3% to 3.7% WER. In the Brain-to-Text 2024 competition, LISA per- forms best, improving the top WER from 9.8% to 8.9%. To the best of our knowledge, this work represents the first instance where noninvasive silent speech recognition on an open vocabulary has cleared the threshold of 15% WER, demon- strating that SSIs can be a viable alternative to au- tomatic speech recognition (ASR). Our work not only narrows the performance gap between silent and vocalized speech but also opens new possi- bilities in human-computer interaction, demon- strating the potential of cross-modal approaches in noisy and data-limited regimes.

Natural Language Processing

Machine Learning

Foundation Models

Research

LABOR-LLM: Language-Based Occupational Representations with Large Language Models

Susan Athey, Herman Brunborg, Tianyu Du, Ayush Kanodia, Keyon Vafa

Dec 11

Research

Vafa et al. (2024) introduced a transformer-based econometric model, CAREER, that predicts a worker’s next job as a function of career history (an “occupation model”). CAREER was initially estimated (“pre-trained”) using a large, unrepresentative resume dataset, which served as a “foundation model,” and parameter estimation was continued (“fine-tuned”) using data from a representative survey. CAREER had better predictive performance than benchmarks. This paper considers an alternative where the resume-based foundation model is replaced by a large language model (LLM). We convert tabular data from the survey into text files that resemble resumes and fine-tune the LLMs using these text files with the objective to predict the next token (word). The resulting fine-tuned LLM is used as an input to an occupation model. Its predictive performance surpasses all prior models. We demonstrate the value of fine-tuning and further show that by adding more career data from a different population, fine-tuning smaller LLMs surpasses the performance of fine-tuning larger models.

LABOR-LLM: Language-Based Occupational Representations with Large Language Models

Susan Athey, Herman Brunborg, Tianyu Du, Ayush Kanodia, Keyon Vafa

Dec 11

Vafa et al. (2024) introduced a transformer-based econometric model, CAREER, that predicts a worker’s next job as a function of career history (an “occupation model”). CAREER was initially estimated (“pre-trained”) using a large, unrepresentative resume dataset, which served as a “foundation model,” and parameter estimation was continued (“fine-tuned”) using data from a representative survey. CAREER had better predictive performance than benchmarks. This paper considers an alternative where the resume-based foundation model is replaced by a large language model (LLM). We convert tabular data from the survey into text files that resemble resumes and fine-tune the LLMs using these text files with the objective to predict the next token (word). The resulting fine-tuned LLM is used as an input to an occupation model. Its predictive performance surpasses all prior models. We demonstrate the value of fine-tuning and further show that by adding more career data from a different population, fine-tuning smaller LLMs surpasses the performance of fine-tuning larger models.

Foundation Models

Natural Language Processing

Research

What is a Large Language Model (LLM)?

Navigate

Participate

Stay Up To Date

LLMs mentioned at Stanford HAI

Toward Responsible Development and Evaluation of LLMs in Psychotherapy

Toward Responsible Development and Evaluation of LLMs in Psychotherapy

Stanford debuts first AI benchmark to help understand LLMs

Stanford debuts first AI benchmark to help understand LLMs

Mind the (Language) Gap: Mapping the Challenges of LLM Development in Low-Resource Language Contexts

Mind the (Language) Gap: Mapping the Challenges of LLM Development in Low-Resource Language Contexts

Escalation Risks from LLMs in Military and Diplomatic Contexts

Escalation Risks from LLMs in Military and Diplomatic Contexts

Can AI Hold Consistent Values? Stanford Researchers Probe LLM Consistency and Bias

Can AI Hold Consistent Values? Stanford Researchers Probe LLM Consistency and Bias

A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition

A Cross-Modal Approach to Silent Speech with LLM-Enhanced Recognition

LABOR-LLM: Language-Based Occupational Representations with Large Language Models

LABOR-LLM: Language-Based Occupational Representations with Large Language Models

Enroll in a Human-Centered AI Course