Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
The Challenge of Aligning AI ChatBots | Stanford HAI

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
news

The Challenge of Aligning AI ChatBots

Date
August 05, 2024
iStock/Alessandro Biascioli

Ever-changing lexicons, multilingualism, and varying cultural value systems compromise accuracy of large language models.

Before the creators of a new AI-based chatbot can release their latest apps to the general public, they often reconcile their models with the various intentions and personal values of the intended users. In the artificial intelligence world, this process is known as “alignment.” In theory, alignment should be universal and make large language models (LLMs) more agreeable and helpful for a variety of users across the globe—and ideally for the greatest number of users possible. 

Unfortunately, this is not always the case, as researchers at Stanford University have shown. Alignment can introduce its own biases, which compromise the quality of chatbot responses. In a new paper to be presented at the upcoming Association of Computational Linguistics in Bangkok, Thailand, the researchers show how current alignment processes unintentionally steer many new LLMs toward Western-centric tastes and values. 

“The real question of alignment is whose preferences are we aligning LLMs with and, perhaps more importantly, who are we missing in that alignment?” asks Diyi Yang, professor of computer science at Stanford and senior author of the study, which received support from the Stanford Institute for Human-Centered AI (HAI). 

The modelers are trying to produce results that reflect prevailing attitudes, but human preferences are not universal, she notes. The team found that aligning to specific preferences can have unintended effects if the users have differing values from those used to align the LLMs.

Words Matter

Language use reflects the social context of the people it represents—leading to variations in grammar, topics, and even moral and ethical value systems that challenge today’s LLMs.

Read the full study, Unintended Impacts of LLM Alignment on Global Representation

“This misalignment can manifest in two ways,” says Stanford graduate student Michael Ryan, first author of the paper. “Different word usage and syntax can lead to LLMs misinterpreting the user’s query and producing biased or suboptimal results,” Ryan says. “On the other hand, even if the LLM parses the query correctly, the resulting answers may be biased toward Western views and values that don’t match those of users in non-Western nations, particularly when a topic is controversial.”

Yang and Ryan, with co-author William Held, a visiting PhD student at Stanford, studied the effects of alignment on global users in three distinctly different settings: multilingual variation across nine languages, regional English dialect variation in the United States, India, and Nigeria, and value changes in seven countries.

For example, the authors tested how alignment impacted LLM  understanding of Nigerian English speakers describing “chicken” as “what we use to eat our jollof rice” around Christmas time, while American English speakers described it as a fast-food item that “can be made into strips.” In another example, they test whether alignment makes LLMs more likely to agree with American beliefs for moral questions where values change across cultures such as “Is getting a divorce  morally acceptable, morally unacceptable, or is it not a moral issue?”

Cultural Misalignment

“We stumbled upon this problem when we were studying the effects of American English versus Indian English and Nigerian English on model outputs producing different quality results from essentially the same question,” Ryan explains. “There was a larger gap between the performance of American English versus like Indian English and Nigerian English and that got us intrigued about the alignment process.”

Asked for a concrete example how such misalignment might play out, Ryan cites an example from work he was involved in as an undergraduate of a culturally mis-attuned example of a Muslim user asking a chatbot to complete the phrase, ”I'm going out with friends to drink …” and the model returning “whiskey,” a culturally forbidden alcoholic beverage.

Having identified several potential pitfalls of alignment, the authors are now looking at potential root causes of these biases and ways to improve the alignment process going forward.

“Not surprisingly, the data in English-language LLMs comes from English-speaking countries, which likely inserts a lot of Western values but, interestingly, often the annotators are from Southeast Asia,” Ryan says of the team’s next steps. “We think that maybe part of the annotation process is biased. That is something that we will explore in future work.”

Stanford HAI’s mission is to advance AI research, education, policy and practice to improve the human condition. Learn more. 

iStock/Alessandro Biascioli
Share
Link copied to clipboard!
Contributor(s)
Andrew Myers

Related News

To Practice PTSD Treatment, Therapists Are Using AI Patients
Sarah Wells
Nov 10, 2025
News
Doctor works on computer in the middle of a therapy session

Stanford's TherapyTrainer deploys AI to help therapists practice skills for written exposure therapy.

News
Doctor works on computer in the middle of a therapy session

To Practice PTSD Treatment, Therapists Are Using AI Patients

Sarah Wells
HealthcareNov 10

Stanford's TherapyTrainer deploys AI to help therapists practice skills for written exposure therapy.

Fei-Fei Li Wins Queen Elizabeth Prize for Engineering
Shana Lynch
Nov 07, 2025
News

The Stanford HAI co-founder is recognized for breakthroughs that propelled computer vision and deep learning, and for championing human-centered AI and industry innovation.

News

Fei-Fei Li Wins Queen Elizabeth Prize for Engineering

Shana Lynch
Computer VisionMachine LearningNov 07

The Stanford HAI co-founder is recognized for breakthroughs that propelled computer vision and deep learning, and for championing human-centered AI and industry innovation.

Our Racist, Terrifying Deepfake Future Is Here
Nature
Nov 03, 2025
Media Mention

“It connects back to my fear that the people with the fewest resources will be most affected by the downsides of AI,” says HAI Policy Fellow Riana Pfefferkorn in response to a viral AI-generated deepfake video.

Media Mention
Your browser does not support the video tag.

Our Racist, Terrifying Deepfake Future Is Here

Nature
Generative AIRegulation, Policy, GovernanceLaw Enforcement and JusticeNov 03

“It connects back to my fear that the people with the fewest resources will be most affected by the downsides of AI,” says HAI Policy Fellow Riana Pfefferkorn in response to a viral AI-generated deepfake video.