Stanford HAI junior fellow Johannes Eichstaedt built an algorithm that can provide, in principle, a real-time indication of community health.
Linda A. Cicero / Stanford News Service
Social media can reveal more than just a single person’s mood or frame of mind. It can capture the psychological states of an entire population, according to new research by Stanford scholar Johannes Eichstaedt.
Eichstaedt’s results, published April 27 in the Proceedings of the National Academy of Sciences, found that through machine-learning – teaching a computer to identify and analyze patterns in large datasets – researchers can see, in principle, how a society is doing in real-time.
“These methods really show how to do psychological measurement in the 21st century in our digital world,” said Eichstaedt, who is an assistant professor of psychology in the School of Humanities and Sciences and a junior fellow at the Stanford Institute for Human-Centered Artificial Intelligence.
For the past decade, Eichstaedt has tested how to use social media, including Twitter, as a way to measure the well-being of a community. He contends that social media provides the largest data set on behavior, emotions and thoughts in human history.
While the researchers acknowledge in the paper that Twitter is not representative of the U.S. population, it can still provide insight into how people experience their everyday life.
“What we really care about is how well the population is doing in terms of psychological and physical health, rather than merely that the GDP is growing,” said Eichstaedt. “You might not care about measuring subjective well-being in and of itself, but subjective well-being impacts mortality, including heart disease. It also impacts the economic bottom lines. So, it’s quite an important variable to capture for a population.”
From Survey Research to Social Media
To evaluate the different ways to analyze a region’s well-being, Eichstaedt and a team of researchers compared over a billion geo-tagged Tweets from 2009 to 2015 to 1.7 million responses from the Gallup-Sharecare Well-Being Index, an in-depth survey that measures how people experience everyday life.
Researchers have long relied on surveys like Gallup to measure a population’s well-being. While accurate, they can be costly and time-consuming undertakings. Sometimes it takes years to gather enough data for rough community estimates, said Eichstaedt.
But when augmented with data-driven techniques, some of that burden can be alleviated. Eichstaedt found that when an algorithm is trained with both users’ responses to a written well-being survey and a sample of posts from social media from the same respondents, it can then be deployed on a much larger scale to predict how people from an entire region would have responded on a traditional survey based only on their Tweets.
Understanding Words out of Context
Before machine learning methods were used, researchers either picked words or asked raters to annotate words for how “positive” they are. But it can be very tricky to pick words that measure well-being, said Eichstaedt.
For example, the researchers found that internet slang such as “LOL” – the popular acronym for “laugh out loud” – and the words “good” and “love” were frequently used in areas with lower income and education (and, in general, lower well-being). So even though these might seem like positive words, they may not be, Eichstaedt said.
Similarly, Eichstaedt found that words like “homework” and “taxes” might seem negative out of context, but the researchers found that these words were used more by people with higher education and income – a group that other studies have found to typically have higher well-being.
“When picking words to measure well-being, it’s really important to pay attention to cultural differences in language use across the U.S.,” said Eichstaedt.
But machine learning methods can help determine which words are more important than others. When the algorithm compared a person’s social media posts against their survey responses, it learned that words like “LOL” are not reliable indicators of well-being and instead used words such as “fun” and “excited.”
“Having the computer learn the words may be the best way to find words that measure well-being,” Eichstaedt said. “Differences in language use can be quite complex.”
The researchers note that well-being is also associated with other important factors, including overall health. For example, how stressed people are can induce unhealthy behaviors – such as excessive drinking or smoking – that in turn negatively impact their health, he said.
“When people are suffering from depression and anxiety, we need to know so that we can ensure they have the resources they need,” said Eichstaedt, who is currently applying this method to study the impact of the novel coronavirus pandemic on the population of cities across the U.S.
“COVID-19 is a natural disaster that interrupts our social norms and routines at an unprecedented scale,” Eichstaedt said. “With this real-time Twitter-based technology, psychologists can monitor if loneliness and anxiety are taking hold in communities, and how our well-being is impacted by social distancing. There is no other data source that can provide such measurement at population scale and give estimates so quickly. Now more than ever, using robust machine learning methods is very important.”
Co-authors on the paper include Kokil Jaidka who is affiliated with the National University of Singapore, Salvatore Giorgi and Lyle H. Ungar who are affiliated with the University of Pennsylvania, H. Andrew Schwartz of Stony Brook University and Margaret L. Kern from the University of Melbourne. Support for this research was provided by a Nanyang Presidential Postdoctoral Award, Adobe Research Award, Robert Wood Johnson Foundation Pioneer Award and a Templeton Religion Trust Grant.