Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
Why AI Struggles To Recognize Toxic Speech on Social Media | Stanford HAI

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
news

Why AI Struggles To Recognize Toxic Speech on Social Media

Date
July 13, 2021
Topics
Design, Human-Computer Interaction
Machine Learning
Communications, Media

AI speech police are smart and fast, so why is there a gap between strong algorithmic performance and reality? 

Facebook says its artificial intelligence models identified and pulled down 27 million pieces of hate speech in the final three months of 2020. In 97 percent of the cases, the systems took action before humans had even flagged the posts.

That’s a huge advance, and all the other major social media platforms are using AI-powered systems in similar ways. Given that people post hundreds of millions of items every day, from comments and memes to articles, there’s no real alternative. No army of human moderators could keep up on its own.

But a team of human-computer interaction and AI researchers at Stanford sheds new light on why automated speech police can score highly accurately on technical tests yet  provoke a lot dissatisfaction from humans with their decisions.  The main problem: There is a huge difference between evaluating more traditional AI tasks, like recognizing spoken language, and the much messier task of identifying hate speech, harassment, or misinformation — especially in today’s polarized environment.

Read the study: The Disagreement Deconvolution: Bringing Machine Learning Performance Metrics In Line With Reality

 

“It appears as if the models are getting almost perfect scores, so some people think they can use them as a sort of black box to test for toxicity,’’ says Mitchell Gordon, a PhD candidate in computer science who worked on the project. “But that’s not the case. They’re evaluating these models with approaches that work well when the answers are fairly clear, like recognizing whether ‘java’ means coffee or the computer language, but these are tasks where the answers are not clear.”

The team hopes their study will illuminate the gulf between what developers think they’re achieving and the reality — and perhaps help them develop systems that grapple more thoughtfully with the inherent disagreements around toxic speech.

Too Much Disagreement

There are no simple solutions, because there will never be unanimous agreement on highly contested issues. Making matters more complicated, people are often ambivalent and inconsistent about how they react to a particular piece of content.

In one study, for example, human annotators rarely reached agreement when they were asked to label tweets that contained words from a lexicon of hate speech. Only 5 percent of the tweets were acknowledged by a majority as hate speech, while only 1.3 percent received unanimous verdicts. In a study on recognizing misinformation, in which people were given statements about purportedly true events, only 70 percent agreed on whether most of the events had or had not occurred.

Despite this challenge for human moderators, conventional AI models achieve high scores on recognizing toxic speech —  .95 “ROCAUC” — a popular metric for evaluating AI models in which 0.5 means pure guessing and 1.0 means perfect performance. But the Stanford team found that the real score is much lower — at most .73 — if you factor in the disagreement among human annotators.

Reassessing the Models

In a new study, the Stanford team re-assesses the performance of today’s AI models by getting a more accurate measure of what people truly believe and how much they disagree among themselves.

The study was overseen by Michael Bernstein and Tatsunori Hashimoto, associate and assistant professors of computer science and faculty members of the Stanford Institute for Human-Centered Artificial Intelligence (HAI). In addition to Gordon, Bernstein, and Hashimoto, the paper’s co-authors include Kaitlyn Zhou, a PhD candidate in computer science, and Kayur Patel, a researcher at Apple Inc.

To get a better measure of real-world views, the researchers developed an algorithm to filter out the “noise” — ambivalence, inconsistency, and misunderstanding — from how people label things like toxicity, leaving an estimate of the amount of true disagreement. They focused on how repeatedly each annotator labeled the same kind of language in the same way. The most consistent or dominant responses became what the researchers call "primary labels," which the researchers then used as a more precise dataset that captures more of the true range of opinions about potential toxic content.

The team then used that approach to refine datasets that are widely used to train AI models in spotting toxicity, misinformation, and pornography. By applying existing AI metrics to these new “disagreement-adjusted” datasets, the researchers revealed dramatically less confidence about decisions in each category. Instead of getting nearly perfect scores on all fronts, the AI models achieved only .73 ROCAUC in classifying toxicity and 62 percent accuracy in labeling misinformation. Even for pornography — as in, “I know it when I see it” — the accuracy was only .79.

Someone Will Always Be Unhappy. The Question Is Who?

Gordon says AI models, which must ultimately make a single decision, will never assess hate speech or cyberbullying to everybody’s satisfaction. There will always be vehement disagreement. Giving human annotators more precise definitions of hate speech may not solve the problem either, because people end up suppressing their real views in order to provide the “right” answer.

But if social media platforms have a more accurate picture of what people really believe, as well as which groups hold particular views, they can design systems that make more informed and intentional decisions.

In the end, Gordon suggests, annotators as well as social media executives will have to make value judgments with the knowledge that many decisions will always be controversial.

“Is this going to resolve disagreements in society? No,” says Gordon. “The question is what can you do to make people less unhappy. Given that you will have to make some people unhappy, is there a better way to think about whom you are making unhappy?”

Stanford HAI's mission is to advance AI research, education, policy and practice to improve the human condition. Learn more. 

Share
Link copied to clipboard!
Contributor(s)
Edmund L. Andrews
Related
  • How to Get More Truth from Social Media
    Stanford Engineering Staff
    Feb 04
    news

    Sociologist and former journalist Mutale Nkonde warns that the AI behind much of today’s social media is inherently biased — but it’s not too late to do something about it.

Related News

The AI Race Has Gotten Crowded—and China Is Closing In on the US
Wired
Apr 07, 2025
Media Mention

Vanessa Parli, Stanford HAI Director of Research and AI Index Steering Committee member, notes that the 2025 AI Index reports flourishing and higher-quality academic research in AI.

Media Mention
Your browser does not support the video tag.

The AI Race Has Gotten Crowded—and China Is Closing In on the US

Wired
Regulation, Policy, GovernanceEconomy, MarketsFinance, BusinessGenerative AIIndustry, InnovationMachine LearningSciences (Social, Health, Biological, Physical)Apr 07

Vanessa Parli, Stanford HAI Director of Research and AI Index Steering Committee member, notes that the 2025 AI Index reports flourishing and higher-quality academic research in AI.

Here are 3 Big Takeaways from Stanford's AI Index Report
Tech Brew
Apr 07, 2025
Media Mention

Vanessa Parli, HAI Director of Research and AI Index Steering Committee member, speaks about the biggest takeaways from the 2025 AI Index Report.

Media Mention
Your browser does not support the video tag.

Here are 3 Big Takeaways from Stanford's AI Index Report

Tech Brew
Sciences (Social, Health, Biological, Physical)Machine LearningRegulation, Policy, GovernanceIndustry, InnovationApr 07

Vanessa Parli, HAI Director of Research and AI Index Steering Committee member, speaks about the biggest takeaways from the 2025 AI Index Report.

Stanford HAI's 2025 AI Index Reveals Record Growth in AI Capabilities, Investment, and Regulation
Yahoo Finance
Apr 07, 2025
Media Mention

"The AI Index equips policymakers, researchers, and the public with the data they need to make informed decisions — and to ensure AI is developed with human-centered values at its core," says Russell Wald, Executive Director of Stanford HAI and Steering Committee member of the AI Index.

Media Mention
Your browser does not support the video tag.

Stanford HAI's 2025 AI Index Reveals Record Growth in AI Capabilities, Investment, and Regulation

Yahoo Finance
Economy, MarketsMachine LearningRegulation, Policy, GovernanceWorkforce, LaborIndustry, InnovationSciences (Social, Health, Biological, Physical)Ethics, Equity, InclusionApr 07

"The AI Index equips policymakers, researchers, and the public with the data they need to make informed decisions — and to ensure AI is developed with human-centered values at its core," says Russell Wald, Executive Director of Stanford HAI and Steering Committee member of the AI Index.