HAI Policy Briefs
October 2020
Toward Fairness in Health Care Training Data
With recent advances in artificial intelligence (AI), researchers can now train sophisticated computer algorithms to interpret medical images – often with accuracy comparable to trained physicians. Yet our recent survey of medical research shows that these algorithms rely on datasets that lack population diversity and could introduce bias into the understanding of a patient’s health condition.
Key Takeaways
➜ Bias arises when we build algorithms using datasets that don’t mirror the population. When generalized to larger swathes of the population, these nonrepresentative data have the potential to confound research findings.
➜ The vast majority of the health data used to build AI algorithms came from only three states, with little or no representation from the remaining 47 states.
➜ Policymakers, regulators, industry, and academia need to work together to ensure medical AI data reflect America’s diversity across not only geography but also many other important attributes. To that end, nationwide data sharing initiatives should be a top priority.
Authors
Amit Kaushal - Stanford University
Russ Altman - Stanford University
Curtis Langlotz - Chapman University