Whose Opinions Do Language Models Reflect?

This brief introduces a quantitative framework that allows policymakers to evaluate the behavior of language models to assess what kinds of opinions they reflect.
Key Takeaways
Language models are shaped by a variety of inputs and opinions, from the individuals whose views are included in the training data to crowd workers who manually filter that data.
We found that language models fine-tuned with human feedback—meaning models that went through additional training with human input—were less representative of the general public’s opinions than models that were not fine-tuned.
It is possible to steer a language model toward the opinions of a particular demographic group by asking the model to respond as if it were a member of that group, but this can lead to undesirable side effects, such as exacerbating polarization and creating echo chambers.
We highlight the need for further research on the evaluation of language models that can help policymakers and regulators quantitatively assess language model behavior and compare it to human preferences and opinions.
Executive Summary
Since the November 2022 debut of ChatGPT, an Al chatbot developed by OpenAl, language models have been all over the news. But as people use chatbots-to write stories and look up recipes, to make travel plans and even further a real estate business-journalists, policymakers, and members of the public are increasingly paying attention to the important question of whose opinions these language models reflect. In particular, one emerging concern is that Al-generated text may be able to influence our views, including political beliefs, without our realizing it.
Language models, chatbots included, are shaped by a variety of data inputs. These inputs are provided by internet users in the form of training data (such as the authors of internet comments or blogs), crowd workers providing feedback on how to improve data or models (as OpenAl used in Kenya), and the developers themselves (who make high-level decisions regarding data collection and training). The data used to inform language models therefore represents a range of individuals and draws on a wide variety of opinions about sports, politics, culture, food, and many other topics. Meanwhile, language models are being asked subjective questions that have no clear right or wrong answer.
In our paper, "Whose Opinions Do Language Models Reflect?," we introduce a quantitative framework to answer this very question. The framework includes our development of a dataset to evaluate language models' alignment with 60 demographic groups in the United States, covering a diversity of topics. Using this framework, we find a major gap between the responses provided by language models and the views of demographic groups in the United States. We also discover a number of U.S. groups whose views are poorly reflected by current language models, such as people 65 years of age and older, widowed individuals, and people who regularly attend religious services. Our framework allows policymakers to quantitatively evaluate language models and serves as a reminder that issues of representation in language models should remain front of mind.







