The Challenge of Aligning AI ChatBots
iStock/Alessandro Biascioli
Before the creators of a new AI-based chatbot can release their latest apps to the general public, they often reconcile their models with the various intentions and personal values of the intended users. In the artificial intelligence world, this process is known as “alignment.” In theory, alignment should be universal and make large language models (LLMs) more agreeable and helpful for a variety of users across the globe—and ideally for the greatest number of users possible.
Unfortunately, this is not always the case, as researchers at Stanford University have shown. Alignment can introduce its own biases, which compromise the quality of chatbot responses. In a new paper to be presented at the upcoming Association of Computational Linguistics in Bangkok, Thailand, the researchers show how current alignment processes unintentionally steer many new LLMs toward Western-centric tastes and values.
“The real question of alignment is whose preferences are we aligning LLMs with and, perhaps more importantly, who are we missing in that alignment?” asks Diyi Yang, professor of computer science at Stanford and senior author of the study, which received support from the Stanford Institute for Human-Centered AI (HAI).
The modelers are trying to produce results that reflect prevailing attitudes, but human preferences are not universal, she notes. The team found that aligning to specific preferences can have unintended effects if the users have differing values from those used to align the LLMs.
Words Matter
Language use reflects the social context of the people it represents—leading to variations in grammar, topics, and even moral and ethical value systems that challenge today’s LLMs.
Read the full study, Unintended Impacts of LLM Alignment on Global Representation
“This misalignment can manifest in two ways,” says Stanford graduate student Michael Ryan, first author of the paper. “Different word usage and syntax can lead to LLMs misinterpreting the user’s query and producing biased or suboptimal results,” Ryan says. “On the other hand, even if the LLM parses the query correctly, the resulting answers may be biased toward Western views and values that don’t match those of users in non-Western nations, particularly when a topic is controversial.”
Yang and Ryan, with co-author William Held, a visiting PhD student at Stanford, studied the effects of alignment on global users in three distinctly different settings: multilingual variation across nine languages, regional English dialect variation in the United States, India, and Nigeria, and value changes in seven countries.
For example, the authors tested how alignment impacted LLM understanding of Nigerian English speakers describing “chicken” as “what we use to eat our jollof rice” around Christmas time, while American English speakers described it as a fast-food item that “can be made into strips.” In another example, they test whether alignment makes LLMs more likely to agree with American beliefs for moral questions where values change across cultures such as “Is getting a divorce morally acceptable, morally unacceptable, or is it not a moral issue?”
Cultural Misalignment
“We stumbled upon this problem when we were studying the effects of American English versus Indian English and Nigerian English on model outputs producing different quality results from essentially the same question,” Ryan explains. “There was a larger gap between the performance of American English versus like Indian English and Nigerian English and that got us intrigued about the alignment process.”
Asked for a concrete example how such misalignment might play out, Ryan cites an example from work he was involved in as an undergraduate of a culturally mis-attuned example of a Muslim user asking a chatbot to complete the phrase, ”I'm going out with friends to drink …” and the model returning “whiskey,” a culturally forbidden alcoholic beverage.
Having identified several potential pitfalls of alignment, the authors are now looking at potential root causes of these biases and ways to improve the alignment process going forward.
“Not surprisingly, the data in English-language LLMs comes from English-speaking countries, which likely inserts a lot of Western values but, interestingly, often the annotators are from Southeast Asia,” Ryan says of the team’s next steps. “We think that maybe part of the annotation process is biased. That is something that we will explore in future work.”
Stanford HAI’s mission is to advance AI research, education, policy and practice to improve the human condition. Learn more.