Toward Fairness in Health Care Training Data

With recent advances in artificial intelligence (AI), researchers can now train sophisticated computer algorithms to interpret medical images – often with accuracy comparable to trained physicians. Yet our recent survey of medical research shows that these algorithms rely on datasets that lack population diversity and could introduce bias into the understanding of a patient’s health condition.
Key Takeaways
Bias arises when we build algorithms using datasets that don’t mirror the population. When generalized to larger swathes of the population, these nonrepresentative data have the potential to confound research findings.
The vast majority of the health data used to build AI algorithms came from only three states, with little or no representation from the remaining 47 states.
Policymakers, regulators, industry, and academia need to work together to ensure medical AI data reflect America’s diversity across not only geography but also many other important attributes. To that end, nationwide data sharing initiatives should be a top priority.