Natural Language Processing

NLP is changing how we interact with machines, enabling more fluid communication and better understanding of human language.

An Open-Source AI Agent for Doing Tasks on the Web

Katharine Miller

Mar 27, 2025

News

NNetNav learns how to navigate websites by mimicking childhood learning through exploration.

News

An Open-Source AI Agent for Doing Tasks on the Web

Katharine Miller

Machine LearningNatural Language ProcessingMar 27

NNetNav learns how to navigate websites by mimicking childhood learning through exploration.

The Promise and Perils of Artificial Intelligence in Advancing Participatory Science and Health Equity in Public Health

Abby C King, Zakaria N Doueiri, Ankita Kaulberg, Lisa Goldman Rosas

Feb 14, 2025

Research

Current societal trends reflect an increased mistrust in science and a lowered civic engagement that threaten to impair research that is foundational for ensuring public health and advancing health equity. One effective countermeasure to these trends lies in community-facing citizen science applications to increase public participation in scientific research, making this field an important target for artificial intelligence (AI) exploration. We highlight potentially promising citizen science AI applications that extend beyond individual use to the community level, including conversational large language models, text-to-image generative AI tools, descriptive analytics for analyzing integrated macro- and micro-level data, and predictive analytics. The novel adaptations of AI technologies for community-engaged participatory research also bring an array of potential risks. We highlight possible negative externalities and mitigations for some of the potential ethical and societal challenges in this field.

Research

The Promise and Perils of Artificial Intelligence in Advancing Participatory Science and Health Equity in Public Health

Abby C King, Zakaria N Doueiri, Ankita Kaulberg, Lisa Goldman Rosas

Foundation ModelsGenerative AIMachine LearningNatural Language ProcessingSciences (Social, Health, Biological, Physical)HealthcareFeb 14

Safety Risks from Customizing Foundation Models via Fine-Tuning

Peter Henderson, Xiangyu Qi, Yi Zeng, Tinghao Xie, Pin-Yu Chen, Ruoxi Jia, Prateek Mittal

Jan 08, 2024

Policy Brief

This brief underscores the safety risks inherent in custom fine-tuning of large language models.

Policy Brief

Safety Risks from Customizing Foundation Models via Fine-Tuning

Peter Henderson, Xiangyu Qi, Yi Zeng, Tinghao Xie, Pin-Yu Chen, Ruoxi Jia, Prateek Mittal

Natural Language ProcessingFoundation ModelsJan 08

This brief underscores the safety risks inherent in custom fine-tuning of large language models.

Christopher Manning

Person

Christopher Manning

Natural Language ProcessingOct 05

Chatbots, Like the Rest of Us, Just Want to Be Loved

Wired

Mar 05, 2025

Media Mention

A study led by Stanford HAI Faculty Fellow Johannes Eichstaedt reveals that large language models adapt their behavior to appear more likable when they are being studied, mirroring human tendencies to present favorably.

Media Mention

Chatbots, Like the Rest of Us, Just Want to Be Loved

Wired

Natural Language ProcessingMachine LearningGenerative AIFoundation ModelsMar 05

LABOR-LLM: Language-Based Occupational Representations with Large Language Models

Susan Athey, Herman Brunborg, Tianyu Du, Ayush Kanodia, Keyon Vafa

Dec 11, 2024

Research

Vafa et al. (2024) introduced a transformer-based econometric model, CAREER, that predicts a worker’s next job as a function of career history (an “occupation model”). CAREER was initially estimated (“pre-trained”) using a large, unrepresentative resume dataset, which served as a “foundation model,” and parameter estimation was continued (“fine-tuned”) using data from a representative survey. CAREER had better predictive performance than benchmarks. This paper considers an alternative where the resume-based foundation model is replaced by a large language model (LLM). We convert tabular data from the survey into text files that resemble resumes and fine-tune the LLMs using these text files with the objective to predict the next token (word). The resulting fine-tuned LLM is used as an input to an occupation model. Its predictive performance surpasses all prior models. We demonstrate the value of fine-tuning and further show that by adding more career data from a different population, fine-tuning smaller LLMs surpasses the performance of fine-tuning larger models.

Research

LABOR-LLM: Language-Based Occupational Representations with Large Language Models

Susan Athey, Herman Brunborg, Tianyu Du, Ayush Kanodia, Keyon Vafa

Foundation ModelsNatural Language ProcessingDec 11

All Work Published on Natural Language Processing

AI’s Fairness Problem: When Treating Everyone the Same is the Wrong Approach

Angelina Wang, Michelle Phan, Daniel E. Ho, Sanmi Koyejo

Feb 06, 2025

News

Current generative AI models struggle to recognize when demographic distinctions matter—leading to inaccurate, misleading, and sometimes harmful outcomes.

AI’s Fairness Problem: When Treating Everyone the Same is the Wrong Approach

Angelina Wang, Michelle Phan, Daniel E. Ho, Sanmi Koyejo

Feb 06, 2025

Current generative AI models struggle to recognize when demographic distinctions matter—leading to inaccurate, misleading, and sometimes harmful outcomes.

Machine Learning

Natural Language Processing

News

Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs

Krista Opsahl-Ong, Michael J Ryan, Josh Purtell, David Broman, Christopher Potts, Matei Zaharia, Omar Khattab

Nov 14, 2024

Research

Language Model Programs, i.e. sophisticated pipelines of modular language model (LM) calls, are increasingly advancing NLP tasks, but they require crafting prompts that are jointly effective for all modules. We study prompt optimization for LM programs, i.e. how to update these prompts to maximize a downstream metric without access to module-level labels or gradients. To make this tractable, we factorize our problem into optimizing the free-form instructions and few-shot demonstrations of every module and introduce several strategies to craft task-grounded instructions and navigate credit assignment across modules. Our strategies include (i) program- and data-aware techniques for proposing effective instructions, (ii) a stochastic mini-batch evaluation function for learning a surrogate model of our objective, and (iii) a meta-optimization procedure in which we refine how LMs construct proposals over time. Using these insights we develop MIPRO, a novel algorithm for optimizing LM programs. MIPRO outperforms baseline optimizers on five of seven diverse multi-stage LM programs using a best-in-class open-source model (Llama-3-8B), by as high as 13% accuracy. We have released our new optimizers and benchmark in DSPy at [http://dspy.ai](http://dspy.ai).

Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs

Krista Opsahl-Ong, Michael J Ryan, Josh Purtell, David Broman, Christopher Potts, Matei Zaharia, Omar Khattab

Nov 14, 2024

Natural Language Processing

Research

Percy Liang

Person

Percy Liang

Oct 05, 2024

Foundation Models

Generative AI

Machine Learning

Natural Language Processing

Person

Large Language Models Just Want To Be Liked

Jan 13, 2025

News

When LLMs take surveys on personality traits, they, like people, exhibit a desire to appear likable.

Large Language Models Just Want To Be Liked

Jan 13, 2025

When LLMs take surveys on personality traits, they, like people, exhibit a desire to appear likable.

Natural Language Processing

Foundation Models

Generative AI

News

ReMix: Optimizing Data Mixtures for Large Scale Imitation Learning

Joey Hejna, Chethan Anand Bhateja, Yichen Jiang, Karl Pertsch, Dorsa Sadigh

Sep 05, 2024

Research

Increasingly large robotics datasets are being collected to train larger foundation models in robotics. However, despite the fact that data selection has been of utmost importance to scaling in vision and natural language processing (NLP), little work in robotics has questioned what data such models should actually be trained on. In this work we investigate how to weigh different subsets or "domains'' of robotics datasets during pre-training to maximize worst-case performance across all possible downstream domains using distributionally robust optimization (DRO). Unlike in NLP, we find that these methods are hard to apply out of the box due to varying action spaces and dynamics across robots. Our method, ReMix, employs early stopping and action normalization and discretization to counteract these issues. Through extensive experimentation on both the Bridge and OpenX datasets, we demonstrate that data curation can have an outsized impact on downstream performance. Specifically, domain weights learned by ReMix outperform uniform weights by over 40% on average and human-selected weights by over 20% on datasets used to train the RT-X models.

ReMix: Optimizing Data Mixtures for Large Scale Imitation Learning

Joey Hejna, Chethan Anand Bhateja, Yichen Jiang, Karl Pertsch, Dorsa Sadigh

Sep 05, 2024

Computer Vision

Robotics

Natural Language Processing

Research

Can AI Hold Consistent Values? Stanford Researchers Probe LLM Consistency and Bias

Andrew Myers

Nov 11, 2024

News

New research tests large language models for consistency across diverse topics, revealing that while they handle neutral topics reliably, controversial issues lead to varied answers.

Can AI Hold Consistent Values? Stanford Researchers Probe LLM Consistency and Bias

Andrew Myers

Nov 11, 2024

New research tests large language models for consistency across diverse topics, revealing that while they handle neutral topics reliably, controversial issues lead to varied answers.

Ethics, Equity, Inclusion

Natural Language Processing

Privacy, Safety, Security

News

Stay Up To Date

Navigate

Participate

Natural Language Processing

An Open-Source AI Agent for Doing Tasks on the Web

An Open-Source AI Agent for Doing Tasks on the Web

The Promise and Perils of Artificial Intelligence in Advancing Participatory Science and Health Equity in Public Health

The Promise and Perils of Artificial Intelligence in Advancing Participatory Science and Health Equity in Public Health

Safety Risks from Customizing Foundation Models via Fine-Tuning

Safety Risks from Customizing Foundation Models via Fine-Tuning

Christopher Manning

Christopher Manning

Chatbots, Like the Rest of Us, Just Want to Be Loved

Chatbots, Like the Rest of Us, Just Want to Be Loved

LABOR-LLM: Language-Based Occupational Representations with Large Language Models

LABOR-LLM: Language-Based Occupational Representations with Large Language Models

All Work Published on Natural Language Processing

AI’s Fairness Problem: When Treating Everyone the Same is the Wrong Approach

AI’s Fairness Problem: When Treating Everyone the Same is the Wrong Approach

Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs

Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs

Percy Liang

Percy Liang

Large Language Models Just Want To Be Liked

Large Language Models Just Want To Be Liked

ReMix: Optimizing Data Mixtures for Large Scale Imitation Learning

ReMix: Optimizing Data Mixtures for Large Scale Imitation Learning

Can AI Hold Consistent Values? Stanford Researchers Probe LLM Consistency and Bias

Can AI Hold Consistent Values? Stanford Researchers Probe LLM Consistency and Bias