Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
Addressing Equity in Natural Language Processing of English Dialects | Stanford HAI

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
    • Subscribe to Email
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
news

Addressing Equity in Natural Language Processing of English Dialects

Date
June 12, 2023
Topics
Natural Language Processing
Machine Learning

The Multi-VALUE framework achieves consistent performance across dozens of English dialects.

At this point, bias in AI and natural language processing (NLP) is such a well-documented and frequent issue in the news that when researchers and journalists point out yet another example of prejudice in language models, readers can hardly be surprised. 

However, less well-publicized are the talented minds working to solve these issues of bias, like Caleb Ziems, a third-year PhD student mentored by Diyi Yang, assistant professor in the Computer Science Department at Stanford and an affiliate of Stanford’s Institute for Human-Centered AI (HAI). The research of Ziems and his colleagues led to the development of Multi-VALUE, a suite of resources that aim to address equity challenges in NLP, specifically around the observed performance drops for different English dialects. The result could mean AI tools from voice assistants to translation and transcription services that are more fair and accurate for a wider range of speakers.

“It’s no secret that language technologies have issues with equity in their capacity to operate with speakers of different languages and different varieties of language,” Ziems says. “English is a global contact language which individuals from different communities use to interact with the global economy, global markets, and global partners. So it’s important for accessibility that language technologies can handle the disparities and variations in English.” 

Analyzing Grammar, Not Vocabulary

Current language technologies, which are typically trained on Standard American English (SAE), are fraught with performance issues when handling other English variants. “We’ve seen performance drops in question-answering for Singapore English, for example, of up to 19 percent,” says Ziems. Many of these variants are also considered “low resource,” meaning there’s a paucity of natural, real-world examples of people using these languages. 

Ziems reframed this data-scarcity challenge by looking at what data they do have in abundance. “We used decades of linguistic research housed in a rich online catalog that acts essentially as a structured database of features and rules of these English variants.” By examining the grammatical role that each word plays in a sentence for a specific variant, Ziems could tag and rearrange words, transforming SAE phrases into phrases for different English dialects. For example, the SAE phrase “John was scolded by his boss” would be transformed into Colloquial Singapore English as “John give his boss scold.” 

As Ziems relates, “Many of these patterns were observed by field linguists operating in an oral context with native speakers, and then transcribed.” With this empirical data and the subsequent language rules, Ziems could build a framework for language transformation. Looking at parts of speech and grammatical rules for these dialects enabled Ziems to take a SAE sentence like “She doesn’t have a camera” and break it down into its discrete parts. “We might identify that there’s a negation in there — ‘not’ — and that the verb ‘do’ is connected to that negation.” By analyzing parts of speech in this way, as opposed to just vocabulary, Ziems believes he and the research team have built a robust and comprehensive framework to achieve dialect invariance — constant performance over dialect shifts. 

Limitations and Next Steps

Though Ziems’ work takes an important step in exposing challenges with language variants, he is quick to acknowledge its limitations. “Dialects aren’t nicely bounded, fixed entities. It’s impossible to stop language from changing and acquiring new features, and even the linguistic observations of these features can shift in frequency between speakers from different regions.” 

Even so, Multi-VALUE allows researchers working with language technologies to build parallel datasets that can be used to train and augment their work. “It’s very inefficient to train a new model for every dialect. It’s just not practical for the real world, so moving forward, we can use these tools to train smaller portions of the model — called adapters — which are swappable and can change to adapt to a certain dialect.” 

Ziems also points out the considerable work already being done in the field of NLP in the English language. “Half the world’s population is bilingual or multilingual, and English is just one of the many tools in the toolkit. But it has such an outsized impact on the global economy, and it's a language where we can efficiently and effectively capture this problem of language equity.”

Stanford HAI’s mission is to advance AI research, education, policy and practice to improve the human condition. Learn more.  

Share
Link copied to clipboard!
Contributor(s)
Prabha Kannan

Related News

AI Leaders Discuss How To Foster Responsible Innovation At TIME100 Roundtable In Davos
TIME
Jan 21, 2026
Media Mention

HAI Senior Fellow Yejin Choi discussed responsible AI model training at Davos, asking, “What if there could be an alternative form of intelligence that really learns … morals, human values from the get-go, as opposed to just training LLMs on the entirety of the internet, which actually includes the worst part of humanity, and then we then try to patch things up by doing ‘alignment’?” 

Media Mention
Your browser does not support the video tag.

AI Leaders Discuss How To Foster Responsible Innovation At TIME100 Roundtable In Davos

TIME
Ethics, Equity, InclusionGenerative AIMachine LearningNatural Language ProcessingJan 21

HAI Senior Fellow Yejin Choi discussed responsible AI model training at Davos, asking, “What if there could be an alternative form of intelligence that really learns … morals, human values from the get-go, as opposed to just training LLMs on the entirety of the internet, which actually includes the worst part of humanity, and then we then try to patch things up by doing ‘alignment’?” 

Stanford’s Yejin Choi & Axios’ Ina Fried
Axios
Jan 19, 2026
Media Mention

Axios chief technology correspondent Ina Fried speaks to HAI Senior Fellow Yejin Choi at Axios House in Davos during the World Economic Forum.

Media Mention
Your browser does not support the video tag.

Stanford’s Yejin Choi & Axios’ Ina Fried

Axios
Energy, EnvironmentMachine LearningGenerative AIEthics, Equity, InclusionJan 19

Axios chief technology correspondent Ina Fried speaks to HAI Senior Fellow Yejin Choi at Axios House in Davos during the World Economic Forum.

Spatial Intelligence Is AI’s Next Frontier
TIME
Dec 11, 2025
Media Mention

"This is AI’s next frontier, and why 2025 was such a pivotal year," writes HAI Co-Director Fei-Fei Li.

Media Mention
Your browser does not support the video tag.

Spatial Intelligence Is AI’s Next Frontier

TIME
Computer VisionMachine LearningGenerative AIDec 11

"This is AI’s next frontier, and why 2025 was such a pivotal year," writes HAI Co-Director Fei-Fei Li.