HAI Policy Briefs
September 2023
Whose Opinions Do Language Models Reflect?
Since the November 2022 debut of ChatGPT, language models have been all over the news. But as people use chatbots—to write stories and look up recipes, to make travel plans and even further their real estate business—journalists, policymakers, and members of the public are increasingly paying attention to the important question of whose opinions these language models reflect. In particular, one emerging concern is that AI-generated text may be able to influence our views, including political beliefs, without us realizing it. This brief introduces a quantitative framework that allows policymakers to evaluate the behavior of language models to assess what kinds of opinions they reflect.
Key Takeaways
➜ Language models are shaped by a variety of inputs and opinions, from the individuals whose views are included in the training data to crowd workers who manually filter that data.
➜ We found that language models fine-tuned with human feedback—meaning models that went through additional training with human input—were less representative of the general public’s opinions than models that were not fine-tuned.
➜ It is possible to steer a language model toward the opinions of a particular demographic group by asking the model to respond as if it were a member of that group, but this can lead to undesirable side effects, such as exacerbating polarization and creating echo chambers.
➜ We highlight the need for further research on the evaluation of language models that can help policymakers and regulators quantitatively assess language model behavior and compare it to human preferences and opinions.