Risks of AI Race Detection in the Medical System

This brief warns that AI systems that infer patients’ race in medical settings could deepen existing healthcare disparities.
Key Takeaways
Algorithms that guess a patient’s race, without medical professionals even knowing it, may exacerbate already serious health and patient care disparities between racial groups.
Technical “de-biasing” techniques often discussed for other algorithms, like distorting inputs (e.g., altering images), may have little effectiveness with medical imaging AI.
This research was only made possible due to the efforts of several universities and hospitals to make open medical data a public good, allowing our researchers to explore important research questions without conflicts with commercial interests.
Future research on AI medical imaging regulation and approval should include audits explicitly focused on evaluating an algorithm’s assessment on data that includes racial identity, sex, and age.
Executive Summary
Artificial Intelligence (AI) is being deployed for a range of tasks across the medical system, from patient face-scanning to early-stage cancer detection. The U.S. Food and Drug Administration (FDA) and other regulatory bodies around the world are in the process of vetting a range of such algorithms for use. Many in the field hope these AI systems can lower the costs of care, increase the accuracy of medical diagnostics, and boost hospital efficiency, among other benefits.
At the same time, however, AI systems drawing conclusions about demographic information could seriously exacerbate disparities in the medical system—and this is especially true with race. Left unexamined and unchecked, algorithms that both accurately and inaccurately make assessments of patients’ racial identity could possibly worsen long-standing inequities across the quality and cost of—and access to—care.
Extensive research has already documented how facial and image recognition systems are often more accurate at recognizing lighter-skinned faces than darker-skinned ones. In practice, this has led to facial recognition systems that wrongly identify one Black person as another, or algorithms that do not even recognize darker skin tones. On the flip side, there has been much discussion about what kind of harm could be inflicted when AI systems classify race remarkably well—accurate recognition tools could be used to harm people of color as well.
A groundbreaking series of findings was recently reported by a large international AI research consortium led by Dr. Judy Gichoya, an assistant professor at Emory University, in Reading Race: AI Recognizes Patient’s Racial Identity In Medical Images. This work explores how well AI models, of the kind already deployed in the medical field, can be trained to predict a patient’s race. The investigator team, including researchers from Stanford Center for Artificial Intelligence in Medical & Imaging (AIMI), worked together to apply multiple, commonly deployed machine learning (ML) models to large, publicly and privately available datasets of medical images. These databases included everything from chest and limb X-rays to CT scans of the lungs to mammogram screenings.
Human experts cannot determine a patient’s race on these medical imaging examinations, and so, until our study, it was never seriously investigated as it was not thought possible. To our surprise, we found that AI models can very reliably predict self-reported race from medical images across multiple imaging modalities, datasets, and clinical tasks. Even when we altered characteristics like age, tissue density, and body habitus (physique), the models’ accuracy held true. In and of itself, this may be concerning, as this attribute could be exploited to reproduce or exacerbate racial inequalities in medicine. But the greater risk is that AI systems will trivially learn to predict a patient’s race, without a medical professional even realizing it and reinforce disparate outcomes. Since medical professionals often do not have access to patient race data when performing routine tasks (like a clinical radiologist reviewing a medical image), they would not be able to notice if an algorithm was routinely making bad or harmful decisions based on patient race.
Far more than a medical professional issue, these findings matter for users, developers, and regulators overseeing AI technologies.







