"We shape our buildings; thereafter, they shape us."
Winston Churchill was most definitely referring to physical buildings when he spoke these words. But online social spaces shape our behavior, just like physical buildings. And it's artificial intelligence that shapes social media (Eckles 2021, Narayanan 2023). AI is used to determine what is at the top of our feeds (Backstrom 2016, Fischer 2020), who we might connect with (Guy 2009), and what should be moderated, labeled with a warning, or outright removed.
As the traditional critique goes, social media AIs are optimized for engagement—and engagement signals such as clicks, likes, replies, and shares do play strong roles in these algorithms. But this is also not the complete story. Recognizing the shortcomings of engagement signals, platforms have complemented their algorithms with a battery of surveys, moderation algorithms, downranking signals, peer effect estimation, and other signals.
But, the values encoded in these signals overridingly remain focused on the individual. Values centered on individual experience, such as personal agency, enjoyment, and stimulation, are undeniably important and central requirements for any social media platform. It shouldn't be surprising that reward hacking only on individual values will lead to challenging societal-level outcomes, because the algorithm has no way to reason about societies. But then, what would it even mean to algorithmically model societal-level values? How would you tell an algorithm that it needs to care about democracy in addition to agency and enjoyment?
Could we directly embed societal values into social media AIs?
In a new commentary that we published in the Journal of Online Trust & Safety, colleagues and I argue that it is now possible to directly embed societal values into social media AIs. Not indirectly, like balancing conservative-liberal items in your feed, but by treating them as direct objective functions of the algorithmic system. What if social media algorithms could provide users and platforms levers to directly encode and optimize not just individual values, but also a wide variety of societal values? And if we can do that, could we also apply similar strategies to shape value alignment problems in AIs more broadly?
Efforts such as Reinforcement Learning from Human Feedback (RLHF; Ziegler et al. 2019) and Constitutional AI (Bai et al. 2022) provide some mechanisms for shaping AI systems, but we argue that there is a rich untapped vein: the social sciences. Although societal values might seem slippery, social scientists have collectively operationalized, measured, refined, and replicated these constructs over decades. And it's exactly this precision that now allows us to encode these values directly into algorithms. This raises an opportunity to create a translational science between the social sciences and engineering.
Ultimately, our point of view is that social media AIs may be suffering from a failure of imagination more so than any political impasse. We've labored under the pretense that, with these algorithms, there is a fundamental tradeoff between free speech on one hand and societal values on the other. But, we believe that there remains substantial slack in the system: because we haven't stopped to directly model the values at stake, algorithms are far underperforming their ability to optimize for these values. As a result, we believe that we can push out the Pareto frontier: that models can improve outcomes across multiple values simultaneously, simply by being explicit about those values.
Feed Algorithms Already Embed Values
There is no "neutral" here: whether we like it or not, feed algorithms already embed values. These algorithms have been trained, either explicitly or implicitly, to encode notions of content or behaviors that are considered "good," such as content that garners more views. Values are beliefs about desirable end states; they derive from basic human needs and manifest in different ways across cultures (Rokeach 1973, Schwartz and Bilsky 1987, Schwartz 1992). News feed algorithms, like any designed artifact, reflect and promote the values of their developers (Seaver 2017).
Engagement signals, surveys, wellbeing, and other inputs to social media AIs focus on individual values, because they prioritize personal agency, individual enjoyment, and individual stimulation. If we want our feeds to consider other values, we need to model them directly.
A Translational Science from the Social Sciences to AI Model Design
Can we lift our gaze from individual to societal values? What does it mean to encode a societal value into an algorithm? If we cannot formulate an objective to say what we mean, when handed off to an AI, we are condemned to mean what we say.
Fields such as sociology, political science, law, communication, public health, science and technology studies, and psychology have long developed constructs to operationalize, describe, and measure complex social phenomena. These constructs have been proven reliable through repeated study and testing. In doing so, social scientists often develop measurement scales or codebooks to promote inter-rater reliability and replicability. We observe that the precision in these codebooks and constructs is now enough to translate into an artificial intelligence model. For example, Jia et al. (2023) translate a measure of antidemocratic attitudes into a prompt for in-context learning by a large language model such as ChatGPT. Their work draws on Voelkel et al.'s antidemocratic attitude scale (Voelkel 2023), where, for instance, one variable captures "support for partisan violence." The original survey questions turn into a prompt for a large language model:
Please rate the following message’s support for partisan violence from 1 to 3. Support for partisan violence is defined as "willingness to use violent tactics against outpartisans." Examples of partisan violence include sending threatening and intimidating messages to the opponent party, harassing the opponent party on the Internet, using violence in advancing their political goals or winning more races in the next election.
Your rating should consider whether the following factors exist in the following message:
- A: Show support for partisan violence
- B1: Partisan name-calling
- B2: Emotion or exaggeration
Rate 1 if doesn’t satisfy any of the factors
Rate 2 if doesn’t satisfy A, but satisfies B1 or B2
Rate 3 if satisfies A, B1 and B2
After your rating, please provide reasoning in the following format:
Rating: ### Reason: (### is the separator)
The recipe for this translation is:
- Identify a social science construct that measures the societal value of interest (e.g., reducing partisan violence).
- Translate the social science construct into an automated model (e.g., adapting the qualitative codebook or survey into a prompt to a large language model).
- Test the accuracy of the AI model against validated human annotations.
- Integrate the model into the social media algorithm.
Opening Up Pandora's Feed
Who gets to decide which values are included? When there are differences, especially in multicultural, pluralistic societies, who gets to decide how they should be resolved? Going even further, embedding some societal values will inevitably undermine other values.
On one hand, certain values might seem like unobjectionable table stakes for a system operating in a healthy democracy: e.g., reducing content harmful to democratic governance by inciting violence, reducing disinformation and affective polarization, and increasing content beneficial to democratic governance via promoting civil discourse. But each of these is already complex. TikTok wants to be "the last happy corner of the internet" (Voth 2020), which can imply demoting political content, and Meta's Threads platform has expressly stated that amplifying political content is not their goal. If encoding societal values is at the cost of engagement or user experience, is that a pro- or antidemocratic goal?
On the policy side: how should, or should not, the government be involved in regulating the values that are implicitly or explicitly encoded into social media AIs? Are these decisions considered protected speech by the platforms? In the United States context, would the First Amendment even allow the government to weigh in on the values in these algorithms? Does a choice by a platform to prioritize its own view of democracy raise its own free speech concerns? Given the global reach of social media platforms, how do we navigate the issues of an autocratic society imposing top-down values? Our position here is that an important first step is to make these values embedded in the algorithm explicit, so that they can be deliberated over.
A first step is to make sure that we model the values that are in conflict with each other. We cannot manage values if we cannot articulate them---for example, what might be harmed if an algorithm aims to reduce partisan animosity? Psychologists have developed a theory and measurement of basic values (Schwartz 2012), which we can use to sample values that are both similar and different across cultures and ethnicities.
We also need mechanisms for resolving value conflicts. Currently, platforms assign weights to each component of their ranking model, then integrate these weights to make final decisions. Our approach fits neatly into this existing framework. However, there is headroom for improved technical mechanisms for eliciting and making these trade-offs. One step might be increased participation, as determined through a combination of democratic participation and a bill of rights. An additional step is to elicit tensions between these values and when each one ought to be prioritized over another: for example, under what conditions might speech that increases partisan animosity be downranked, and under what conditions might it be amplified instead? What combination of automated signals and procedural processes will decide this?
Is the right avenue a set of user controls, as in the "middleware" solutions pitched by Bluesky and others, is it platform-centric solutions, or both? Middleware sidesteps many thorny questions, but HCI has known for decades that very few users change defaults, so there need to be processes for determining those defaults.
Ultimately, our point of view is that feed algorithms always embed values, so we ought to be reflective and deliberative about which ones we include. An explicit modeling of societal values, paired with technical advances in natural language processing, will enable us to model a wide variety of constructs from the social and behavioral sciences, providing a powerful toolkit for shaping our online experiences.
Authors: Stanford CS associate professor Michael S. Bernstein, Stanford communication associate professor Angèle Christin, Stanford communication professor Jeffrey T. Hancock, Stanford CS assistant professor Tatsunori Hashimoto, Northeastern assistant professor Chenyan Jia, Stanford CS PhD candidate Michelle Lam, Stanford CS PhD candidate Nicole Meister, Stanford professor of law Nathaniel Persily, Stanford postdoctoral scholar Tiziano Piccardi, University of Washington incoming assistant professor Martin Saveski, Stanford psychology professor Jeanne L. Tsai, Stanford MS&E associate professor Johan Ugander, and Stanford psychology postdoctoral scholar Chunchen Xu.
Stanford HAI’s mission is to advance AI research, education, policy and practice to improve the human condition. Learn more.