Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
Simulating Human Behavior with AI Agents | Stanford HAI

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
policyPolicy Brief

Simulating Human Behavior with AI Agents

Date
May 20, 2025
Topics
Generative AI
Read Paper
abstract

This brief introduces a generative AI agent architecture that can simulate the attitudes of more than 1,000 real people in response to major social science survey questions.

Key Takeaways

  • Simulating human attitudes and behaviors could enable researchers to test interventions and theories and gain real-world insights.

  • We built an AI agent architecture that can simulate real people in ways far more complex than traditional approaches. Using this architecture, we created generative agents that simulate 1,000 individuals, each using an LLM paired with an in-depth interview transcript of the individual.

  • To test these generative agents, we evaluated the agents’ responses against the corresponding person’s responses to major social science surveys and experiments. We found that the agents replicated real participants’ responses 85% as accurately as the individuals replicated their own answers two weeks later on the General Social Survey.

  • Because these generative agents hold sensitive data and can mimic individual behavior, policymakers and researchers must work together to ensure that appropriate monitoring and consent mechanisms are used to help mitigate risks while also harnessing potential benefits.

Executive Summary

AI agents have been gaining widespread attention among the general public as AI systems that can pursue complex goals and directly take actions in both virtual and real-world environments. Today, people can use AI agents to make payments, reserve flights, and place grocery orders for them, and there is great excitement about the potential for AI agents to manage even more sophisticated tasks.

However, a different type of AI agent—a simulation of human behaviors and attitudes—is also on the rise. These simulation AI agents aim to be useful at asking “what if” questions about how people might respond to a range of social, political, or informational contexts. If these agents achieve high accuracy, they could enable researchers to test a broad set of interventions and theories, such as how people would react to new public health messages, product launches, or major economic or political shocks. Across economics, sociology, organizations, and political science, new ways of simulating individual behavior—and the behavior of groups of individuals—could help expand our understanding of social interactions, institutions, and networks. While work on these kinds of agents is progressing, current architectures must cover some distance before their use is reliable.

In our paper, “Generative Agent Simulations of 1,000 People,” we introduce an AI agent architecture that simulates more than 1,000 real people. The agent architecture—built by combining the transcripts of two-hour, qualitative interviews with a large language model (LLM) and scored against social science benchmarks—successfully replicated real individuals’ responses to survey questions 85% as accurately as participants replicate their own answers across surveys staggered two weeks apart. The generative agents performed comparably in predicting people’s personality traits and experiment outcomes and were less biased than previously used simulation tools.

This architecture underscores the benefits of using generative agents as a research tool to glean new insights into real-world individual behavior. However, researchers and policymakers must also mitigate the risks of using generative agents in such contexts, including harms related to over-reliance on agents, privacy, and reputation.

Introduction

Simulations in which agents are used to model the behaviors and interactions of individuals have been a popular tool for empirical social research for years, even before the emergence of AI agents. Traditional approaches to building agent architectures, such as agent-based models or game theory, rely on clear sets of rules and environments manually specified by the researchers. While these rules make it relatively easy to interpret results, they also limit the contexts in which traditional agents can act while oversimplifying the real-life complexity of human behavior. This, in turn, can limit the generalizability and accuracy of the simulation results.

Generative AI models offer the opportunity to build general purpose agents that can simulate behaviors across a variety of contexts. To create simulations that better reflect the myriad, often idiosyncratic factors that influence individuals’ attitudes, beliefs, and behaviors, we built a novel generative agent architecture that combines LLMs with in-depth interviews with real individuals. 

We recruited 1,052 individuals—representative of the U.S. population across age, gender, race, region, education, and political ideology—to participate in two-hour qualitative interviews. These in-depth interviews, which included both pre-specified questions and adaptive follow-up questions, are a foundational social science method that has been successfully used by researchers to predict life outcomes beyond what could be learned from traditional surveys and demographic instruments. We also developed an AI interviewer to ask participants the questions based on a semi-structured interview protocol from the American Voices Project—which ranged from life stories to people’s views on current social issues. 

Then, we built the generative agents based on participants’ full interview transcripts and an LLM. When a generative agent was queried, the full transcript was injected into the model prompt, which instructed the model to imitate the relevant individual when responding to questions, including forced-choice prompts, surveys, and multi-stage interactional settings.

Once the generative agents were in place, we evaluated them on their ability to predict participants’ responses to common social science surveys and experiments, which the participants completed after their in-depth interviews. We tested on the core module of the General Social Survey (widely used to assess survey respondents’ demographic backgrounds, behaviors, attitudes, and beliefs); the 44-item Big Five Inventory (designed to assess an individual’s personality); five well-known behavioral economic games (the dictator game, first and second player trust games, public goods game, and prisoner’s dilemma); and five social science experiments with control and treatment conditions. For the General Social Survey (which has categorical responses), we measured accuracy and correlation based on whether the agent selects the same survey response as the person. For the Big Five Inventory and the economic games (which have continuous responses), we assessed accuracy and correlation using mean absolute error.

Read Paper
Share
Link copied to clipboard!
Authors
  • Joon Sung Park
    Joon Sung Park
  • Carolyn Q. Zou
    Carolyn Q. Zou
  • Aaron Shaw
    Aaron Shaw
  • Benjamin Mako Hill
    Benjamin Mako Hill
  • Carrie J. Cai
    Carrie J. Cai
  • Meredith Ringel Morris
    Meredith Ringel Morris
  • Robb Willer
    Robb Willer
  • Percy Liang
    Percy Liang
  • Michael S. Bernstein
    Michael S. Bernstein

Related Publications

Toward Political Neutrality in AI
Jillian Fisher, Ruth E. Appel, Yulia Tsvetkov, Margaret E. Roberts, Jennifer Pan, Dawn Song, Yejin Choi
Quick ReadSep 10, 2025
Policy Brief

This brief introduces a framework of eight techniques for approximating political neutrality in AI models.

Policy Brief

Toward Political Neutrality in AI

Jillian Fisher, Ruth E. Appel, Yulia Tsvetkov, Margaret E. Roberts, Jennifer Pan, Dawn Song, Yejin Choi
DemocracyGenerative AIQuick ReadSep 10

This brief introduces a framework of eight techniques for approximating political neutrality in AI models.

Labeling AI-Generated Content May Not Change Its Persuasiveness
Isabel Gallegos, Dr. Chen Shani, Weiyan Shi, Federico Bianchi, Izzy Benjamin Gainsburg, Dan Jurafsky, Robb Willer
Quick ReadJul 30, 2025
Policy Brief

This brief evaluates the impact of authorship labels on the persuasiveness of AI-written policy messages.

Policy Brief

Labeling AI-Generated Content May Not Change Its Persuasiveness

Isabel Gallegos, Dr. Chen Shani, Weiyan Shi, Federico Bianchi, Izzy Benjamin Gainsburg, Dan Jurafsky, Robb Willer
Generative AIRegulation, Policy, GovernanceQuick ReadJul 30

This brief evaluates the impact of authorship labels on the persuasiveness of AI-written policy messages.

Demographic Stereotypes in Text-to-Image Generation
Federico Bianchi, Pratyusha Kalluri, Esin Durmus, Faisal Ladhak, Myra Cheng, Debora Nozza, Tatsunori Hashimoto, Dan Jurafsky, James Zou, Aylin Caliskan
Quick ReadNov 30, 2023
Policy Brief

This brief tests a variety of ordinary text prompts to examine how major text-to-image AI models encode a wide range of dangerous biases about demographic groups.

Policy Brief

Demographic Stereotypes in Text-to-Image Generation

Federico Bianchi, Pratyusha Kalluri, Esin Durmus, Faisal Ladhak, Myra Cheng, Debora Nozza, Tatsunori Hashimoto, Dan Jurafsky, James Zou, Aylin Caliskan
Generative AIFoundation ModelsEthics, Equity, InclusionQuick ReadNov 30

This brief tests a variety of ordinary text prompts to examine how major text-to-image AI models encode a wide range of dangerous biases about demographic groups.

Whose Opinions Do Language Models Reflect?
Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, Tatsunori Hashimoto
Quick ReadSep 20, 2023
Policy Brief

This brief introduces a quantitative framework that allows policymakers to evaluate the behavior of language models to assess what kinds of opinions they reflect.

Policy Brief

Whose Opinions Do Language Models Reflect?

Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, Tatsunori Hashimoto
Generative AIEthics, Equity, InclusionQuick ReadSep 20

This brief introduces a quantitative framework that allows policymakers to evaluate the behavior of language models to assess what kinds of opinions they reflect.