Addressing Equity in Natural Language Processing of English Dialects | Stanford HAI
Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
    • Subscribe to Email
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
Navigate
  • About
  • Events
  • AI Glossary
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

news

Addressing Equity in Natural Language Processing of English Dialects

Date
June 12, 2023
Topics
Natural Language Processing
Machine Learning

The Multi-VALUE framework achieves consistent performance across dozens of English dialects.

At this point, bias in AI and natural language processing (NLP) is such a well-documented and frequent issue in the news that when researchers and journalists point out yet another example of prejudice in language models, readers can hardly be surprised. 

However, less well-publicized are the talented minds working to solve these issues of bias, like Caleb Ziems, a third-year PhD student mentored by Diyi Yang, assistant professor in the Computer Science Department at Stanford and an affiliate of Stanford’s Institute for Human-Centered AI (HAI). The research of Ziems and his colleagues led to the development of Multi-VALUE, a suite of resources that aim to address equity challenges in NLP, specifically around the observed performance drops for different English dialects. The result could mean AI tools from voice assistants to translation and transcription services that are more fair and accurate for a wider range of speakers.

“It’s no secret that language technologies have issues with equity in their capacity to operate with speakers of different languages and different varieties of language,” Ziems says. “English is a global contact language which individuals from different communities use to interact with the global economy, global markets, and global partners. So it’s important for accessibility that language technologies can handle the disparities and variations in English.” 

Analyzing Grammar, Not Vocabulary

Current language technologies, which are typically trained on Standard American English (SAE), are fraught with performance issues when handling other English variants. “We’ve seen performance drops in question-answering for Singapore English, for example, of up to 19 percent,” says Ziems. Many of these variants are also considered “low resource,” meaning there’s a paucity of natural, real-world examples of people using these languages. 

Ziems reframed this data-scarcity challenge by looking at what data they do have in abundance. “We used decades of linguistic research housed in a rich online catalog that acts essentially as a structured database of features and rules of these English variants.” By examining the grammatical role that each word plays in a sentence for a specific variant, Ziems could tag and rearrange words, transforming SAE phrases into phrases for different English dialects. For example, the SAE phrase “John was scolded by his boss” would be transformed into Colloquial Singapore English as “John give his boss scold.” 

As Ziems relates, “Many of these patterns were observed by field linguists operating in an oral context with native speakers, and then transcribed.” With this empirical data and the subsequent language rules, Ziems could build a framework for language transformation. Looking at parts of speech and grammatical rules for these dialects enabled Ziems to take a SAE sentence like “She doesn’t have a camera” and break it down into its discrete parts. “We might identify that there’s a negation in there — ‘not’ — and that the verb ‘do’ is connected to that negation.” By analyzing parts of speech in this way, as opposed to just vocabulary, Ziems believes he and the research team have built a robust and comprehensive framework to achieve dialect invariance — constant performance over dialect shifts. 

Limitations and Next Steps

Though Ziems’ work takes an important step in exposing challenges with language variants, he is quick to acknowledge its limitations. “Dialects aren’t nicely bounded, fixed entities. It’s impossible to stop language from changing and acquiring new features, and even the linguistic observations of these features can shift in frequency between speakers from different regions.” 

Even so, Multi-VALUE allows researchers working with language technologies to build parallel datasets that can be used to train and augment their work. “It’s very inefficient to train a new model for every dialect. It’s just not practical for the real world, so moving forward, we can use these tools to train smaller portions of the model — called adapters — which are swappable and can change to adapt to a certain dialect.” 

Ziems also points out the considerable work already being done in the field of NLP in the English language. “Half the world’s population is bilingual or multilingual, and English is just one of the many tools in the toolkit. But it has such an outsized impact on the global economy, and it's a language where we can efficiently and effectively capture this problem of language equity.”

Stanford HAI’s mission is to advance AI research, education, policy and practice to improve the human condition. Learn more.  

Share
Link copied to clipboard!
Contributor(s)
Prabha Kannan

Related News

New Approach to Scaling Laws Could Change How AI Models Are Trained
Andrew Myers
May 21, 2026
News
Digital image symbolizing neural nets

Leveraging statistical concepts from measurement science and education, AI researchers have greatly reduced the computational demand of predicting how the largest of large language models will scale up in the future. It could save millions of dollars in training costs.

News
Digital image symbolizing neural nets

New Approach to Scaling Laws Could Change How AI Models Are Trained

Andrew Myers
Natural Language ProcessingGenerative AIMay 21

Leveraging statistical concepts from measurement science and education, AI researchers have greatly reduced the computational demand of predicting how the largest of large language models will scale up in the future. It could save millions of dollars in training costs.

5 Questions for Russell Wald
Politico
May 08, 2026
Media Mention

HAI Executive Director Russell Wald talks about the AI competition between the U.S. and China, and the advent of “world models” that predict what might happen in real-world environments.

Media Mention
Your browser does not support the video tag.

5 Questions for Russell Wald

Politico
Regulation, Policy, GovernanceMachine LearningComputer VisionMay 08

HAI Executive Director Russell Wald talks about the AI competition between the U.S. and China, and the advent of “world models” that predict what might happen in real-world environments.

Want To Understand The Current State Of AI? Check Out These Charts.
MIT Technology Review
Apr 13, 2026
Media Mention

"If you’re following AI news, you’re probably getting whiplash. AI is a gold rush. AI is a bubble. AI is taking your job. AI can’t even read a clock. The 2026 AI Index from Stanford University’s Institute for Human-Centered Artificial Intelligence, AI’s annual report card, comes out today and cuts through some of that noise."

Media Mention
Your browser does not support the video tag.

Want To Understand The Current State Of AI? Check Out These Charts.

MIT Technology Review
International Affairs, International Security, International DevelopmentEducation, SkillsRegulation, Policy, GovernanceMachine LearningWorkforce, LaborApr 13

"If you’re following AI news, you’re probably getting whiplash. AI is a gold rush. AI is a bubble. AI is taking your job. AI can’t even read a clock. The 2026 AI Index from Stanford University’s Institute for Human-Centered Artificial Intelligence, AI’s annual report card, comes out today and cuts through some of that noise."