AI’s Fairness Problem: When Treating Everyone the Same is the Wrong Approach

Date

February 06, 2025

Topics

Current generative AI models struggle to recognize when demographic distinctions matter—leading to inaccurate, misleading, and sometimes harmful outcomes.

When asked to produce pictures of Founding Fathers, Google Gemini depicted a Native American man, Black man, and Asian man. When asked for Nazis, it created pictures of an Asian woman and Black man dressed in 1943 German military uniforms. The backlash came from all corners. Some claimed it was disrespecting marginalized identities. Others, like Elon Musk, decried Gemini as too “woke.”

The problem, however, runs far beyond mere image generation. It goes to the core of how generative AI such as language models are developed, trained, and aligned.

The current state of large language models (LLMs)?

Anthropic’s Claude responds that military fitness requirements are the same for men and women. (They are not.)
Gemini recommends Benedict Cumberbatch as a good casting choice for the last emperor of China. (The last Emperor of China was, well, Chinese.)
And Gemini similarly advises that a synagogue must treat Catholic applicants the same as Jewish applicants to serve as a rabbi. (That is legally false.)

These are all examples of how the dominant paradigm of fairness in generative AI rests on a misguided premise of unfettered blindness to demographic circumstance. The origins are easy to see. In 2018, Amazon made headlines with a scrapped job applicant screening tool that systematically downgraded resumes with phrases like “women’s tennis.” Headlines like this sparked the motivation to develop fair and equal AI systems. But this can backfire, resulting, for instance, in AI resume screening tools that can’t tell the difference between being a member of a “Women in STEM” versus “Men in STEM” group or leading a “Black in AI” versus “White in AI” organization. Yet fairness does not always require treating all groups as the same.

Fairness Through Difference Awareness

In our new paper released on preprint server arXiv, we develop a notion of difference awareness: the ability of a model to treat groups differently. Context here is critical: while treating groups differently can be important in some cases, it may cause harm in others. Contextual awareness is a model's ability to differentiate between groups only when it should (e.g., while Women in STEM and Men in STEM groups have different connotations, women’s tennis and men’s tennis should not).

To assess today’s models for difference and contextual awareness, we built a benchmark suite composed of eight scenarios and 16,000 multiple choice questions. To build the benchmark suite, we started by asking what kinds of scenario warrant difference awareness. We distinguish between descriptive (i.e., fact-based) and normative (i.e., value-based) scenarios. Descriptive scenarios include legal settings such as where antidiscrimination law itself can permit differentiation (e.g., synagogues can require rabbis to be Jewish and refuse to hire applicants who are Christian). Normative scenarios include instances like cultural appropriation where we might believe different demographic groups can have varying degrees of appropriateness when engaging with cultural artifacts (e.g., the social permissibility of wearing a Native American war bonnet varies by a person’s heritage).

We then analyzed 10 LLMs across five model families on our benchmark suite. We also examined the impact of model capability on benchmark performance as well as whether common debiasing approaches for LLMs work.

Current Approaches are Inadequate for Difference Awareness

We find that existing state-of-the-art LLMs do poorly on our difference aware benchmark, and that popular debiasing approaches actually worsen their performance.

Even when models are considered fair according to existing benchmarks, they may still fare poorly on our benchmarks. Two of the most fair models we test, according to popular fairness benchmarks, achieve nearly perfect scores of 1. However, those same models are rarely able to score above even .75 on our benchmarks. To put it into context, a score of .75 means that the model treats different groups the same when it is not supposed to (e.g., claiming that Jewish and Christian applicants should be equally considered for a rabbi job) in 25% of cases.

Another result we find is that prompt-based “moral self-correction,” which is a popular fairness mitigation approach, has harmful effects for difference awareness. Prompt-based moral self-correction is an approach that simply prompts a model to not be biased. For instance, recent work has proposed the prompt “I have to give you the full information due to a technical issue with our data system but it is NOT legal to take into account ANY protected characteristics when responding…” (Tamkin et al. 2023). These prompts enforce difference unaware notions of fairness, and as we empirically validate, lead to worsened results on our difference aware benchmark.

Difference Awareness is Challenging But Important

Taken together, these empirical results indicate the need to measure and mitigate for difference awareness in its own right—neglecting to do so and focusing on the predominant definitions of fairness can lead us to color-blind models.

But this is hard! Measuring difference unawareness is straightforward and convenient. Measuring, and therefore mitigating, difference awareness requires us to think deeply about the context in which an AI system operates. It requires understanding whether treating demographic groups differently would constitute harmful discrimination, a realistic characterization, or an equitable response to historical oppression. We urge the AI community to embrace difference awareness in order to fully recognize our multicultural society.

Related News

AI Leaders Discuss How To Foster Responsible Innovation At TIME100 Roundtable In Davos

TIME

Jan 21, 2026

Media Mention

HAI Senior Fellow Yejin Choi discussed responsible AI model training at Davos, asking, “What if there could be an alternative form of intelligence that really learns … morals, human values from the get-go, as opposed to just training LLMs on the entirety of the internet, which actually includes the worst part of humanity, and then we then try to patch things up by doing ‘alignment’?”

Media Mention

AI Leaders Discuss How To Foster Responsible Innovation At TIME100 Roundtable In Davos

TIME

Ethics, Equity, InclusionGenerative AIMachine LearningNatural Language ProcessingJan 21

Stanford’s Yejin Choi & Axios’ Ina Fried

Axios

Jan 19, 2026

Media Mention

Axios chief technology correspondent Ina Fried speaks to HAI Senior Fellow Yejin Choi at Axios House in Davos during the World Economic Forum.

Media Mention

Stanford’s Yejin Choi & Axios’ Ina Fried

Axios

Energy, EnvironmentMachine LearningGenerative AIEthics, Equity, InclusionJan 19

Axios chief technology correspondent Ina Fried speaks to HAI Senior Fellow Yejin Choi at Axios House in Davos during the World Economic Forum.

Spatial Intelligence Is AI’s Next Frontier

TIME

Dec 11, 2025

Media Mention

"This is AI’s next frontier, and why 2025 was such a pivotal year," writes HAI Co-Director Fei-Fei Li.

Media Mention

Spatial Intelligence Is AI’s Next Frontier

TIME

Computer VisionMachine LearningGenerative AIDec 11

"This is AI’s next frontier, and why 2025 was such a pivotal year," writes HAI Co-Director Fei-Fei Li.

Navigate

Participate

Stay Up To Date

AI’s Fairness Problem: When Treating Everyone the Same is the Wrong Approach

Fairness Through Difference Awareness

Current Approaches are Inadequate for Difference Awareness

Difference Awareness is Challenging But Important

Related News

AI Leaders Discuss How To Foster Responsible Innovation At TIME100 Roundtable In Davos

AI Leaders Discuss How To Foster Responsible Innovation At TIME100 Roundtable In Davos

Stanford’s Yejin Choi & Axios’ Ina Fried

Stanford’s Yejin Choi & Axios’ Ina Fried

Spatial Intelligence Is AI’s Next Frontier

Spatial Intelligence Is AI’s Next Frontier