Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
Toward Fairness in Health Care Training Data | Stanford HAI
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
    • Subscribe to Email
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

policyPolicy Brief

Toward Fairness in Health Care Training Data

Date
October 01, 2020
Topics
Healthcare
Ethics, Equity, Inclusion
Toward Fairness in Health Care Training Data
Read Paper
abstract

This brief highlights the lack of geographic representation in medical-imaging AI training data and calls for nationwide, diversity-focused data-sharing initiatives.

Key Takeaways

  • Bias arises when we build algorithms using datasets that don’t mirror the population. When generalized to larger swathes of the population, these nonrepresentative data have the potential to confound research findings.

  • The vast majority of the health data used to build AI algorithms came from only three states, with little or no representation from the remaining 47 states.

  • Policymakers, regulators, industry, and academia need to work together to ensure medical AI data reflect America’s diversity across not only geography but also many other important attributes. To that end, nationwide data sharing initiatives should be a top priority.

Executive Summary

With recent advances in artificial intelligence (AI), researchers can now train sophisticated computer algorithms to interpret medical images – often with accuracy comparable to trained physicians. Yet our recent survey of medical research shows that these algorithms rely on datasets that lack population diversity and could introduce bias into the understanding of a patient’s health condition.

Artificial intelligence algorithms increasingly inform the decisions of human experts. In medical imaging, these algorithms may help a doctor spot a subtle finding or suggest an alternate diagnosis. But bias in the data used to train these high-stakes algorithms can bias the algorithm itself. Our analysis shows that the datasets used to develop these algorithms come from only a handful of locations – raising serious questions for policymakers—but also providing opportunities for course correction.

In our research, published in the Journal of the American Medical Association, we looked at data from more than 70 studies that used U.S. patient data to train algorithms designed to compete or collaborate with physicians to perform diagnostic tasks. Overwhelmingly, the datasets came from three states—California, Massachusetts, and New York—with little or no representation from the remaining 47 states. Rectifying this lack of representation in medical data should be front of mind for health policymakers and regulators. Lack of data diversity can be addressed in part by initiatives to streamline the nation’s digital infrastructure, to enhance the availability of patient data from underrepresented populations for larger studies, and to incentivize ethical data sharing and the democratization of medical data.

Read Paper
Share
Link copied to clipboard!
Authors
  • Amit Kaushal
    Amit Kaushal
  • Russ Altman
    Russ Altman
  • Curt Langlotz headshot
    Curtis Langlotz

Related Publications

How Can AI Support Language Digitization and Digital Inclusion?
Juan Pava, Thomas S. Mullaney, Caroline Meinhardt, Audrey Gao, Diyi Yang
Deep DiveFeb 26, 2026
White Paper

This white paper analyzes the varying ways AI tools can advance language digitization work, and provides recommendations for responsibly realizing the potential of AI in supporting the digital inclusion of digitally disadvantaged languages.

White Paper

How Can AI Support Language Digitization and Digital Inclusion?

Juan Pava, Thomas S. Mullaney, Caroline Meinhardt, Audrey Gao, Diyi Yang
Ethics, Equity, InclusionInternational Affairs, International Security, International DevelopmentNatural Language ProcessingDeep DiveFeb 26

This white paper analyzes the varying ways AI tools can advance language digitization work, and provides recommendations for responsibly realizing the potential of AI in supporting the digital inclusion of digitally disadvantaged languages.

Toward Responsible AI in Health Insurance Decision-Making
Michelle Mello, Artem Trotsyuk, Abdoul Jalil Djiberou Mahamadou, Danton Char
Quick ReadFeb 10, 2026
Policy Brief

This brief proposes governance mechanisms for the growing use of AI in health insurance utilization review.

Policy Brief

Toward Responsible AI in Health Insurance Decision-Making

Michelle Mello, Artem Trotsyuk, Abdoul Jalil Djiberou Mahamadou, Danton Char
HealthcareRegulation, Policy, GovernanceQuick ReadFeb 10

This brief proposes governance mechanisms for the growing use of AI in health insurance utilization review.

Response to FDA's Request for Comment on AI-Enabled Medical Devices
Desmond C. Ong, Jared Moore, Nicole Martinez-Martin, Caroline Meinhardt, Eric Lin, William Agnew
Quick ReadDec 02, 2025
Response to Request

Stanford scholars respond to a federal RFC on evaluating AI-enabled medical devices, recommending policy interventions to help mitigate the harms of AI-powered chatbots used as therapists.

Response to Request

Response to FDA's Request for Comment on AI-Enabled Medical Devices

Desmond C. Ong, Jared Moore, Nicole Martinez-Martin, Caroline Meinhardt, Eric Lin, William Agnew
HealthcareRegulation, Policy, GovernanceQuick ReadDec 02

Stanford scholars respond to a federal RFC on evaluating AI-enabled medical devices, recommending policy interventions to help mitigate the harms of AI-powered chatbots used as therapists.

Moving Beyond the Term "Global South" in AI Ethics and Policy
Evani Radiya-Dixit, Angèle Christin
Quick ReadNov 19, 2025
Issue Brief

This brief examines the limitations of the term "Global South" in AI ethics and policy, and highlights the importance of grounding such work in specific regions and power structures.

Issue Brief

Moving Beyond the Term "Global South" in AI Ethics and Policy

Evani Radiya-Dixit, Angèle Christin
Ethics, Equity, InclusionInternational Affairs, International Security, International DevelopmentQuick ReadNov 19

This brief examines the limitations of the term "Global South" in AI ethics and policy, and highlights the importance of grounding such work in specific regions and power structures.