Promoting Algorithmic Fairness in Clinical Risk Prediction
-1.png&w=3840&q=100)
This brief examines the debate on algorithmic fairness in clinical predictive algorithms and recommends paths to safer, more equitable healthcare AI.
Key Takeaways
We studied the trade-offs clinical predictive algorithms face between accuracy and fairness for outcomes like hospital mortality, prolonged stays in the hospital, and 30-day readmissions to the hospital. We found that techniques that make these programs more fair can degrade performance of the algorithm for everyone across the board.
Making algorithmic fixes on the developer’s side should only be one option to fix this. Policymakers should consider ways to incentivize model developers to engage in participatory design practices that incorporate perspectives from patient advocacy groups and civil society organizations.
Algorithmic fixes may work in some contexts, but others may require policymakers to mandate that a human stays in the decision-making loop or the use of the algorithm may not be worthwhile at all.
Executive Summary
Healthcare providers and medical professionals are increasingly using machine learning to advance how treatment is delivered to patients. From medical image analysis to a range of data processing functions, these machine learning applications will only continue to shape patient-care experiences and medical outcomes. Developers, doctors, patients, and policymakers are just some of the stakeholders grappling with these algorithmic uses.
That said, there is a fundamental problem with machine learning in healthcare: We cannot assume developers are making strides to remedy bias and other fairness issues in a concerted manner. Discriminatory AI decision-making is concerning in any setting. This is especially pronounced in a clinical setting, where individuals’ well-being and physical safety are on the line, and where medical professionals face life-or-death decisions every day.
Until now, the conversation about measuring algorithmic fairness in healthcare has focused on fairness itself—and has not fully taken into account how fairness techniques could impact clinical predictive models, which are often derived from large clinical datasets. Our new research, published in the Journal of Biomedical Informatics, seeks to ground this debate in evidence, and suggests the best way forward in developing fairer machine learning tools for a clinical setting.
We explicitly measure trade-offs in the fairness and performance of clinical predictive models. Using three large datasets spanning decades of health outcomes—such as hospital mortality, prolonged stays in the hospital, and 30-day readmissions to the hospital—our research compared these outcomes with three different notions of fairness across demographic groupings—such as race, ethnicity, sex, and age. In total, we find that improvements in algorithmic fairness, based on minimizing differences between demographic groups, cause lowered performance across multiple metrics. This exposes many challenges ahead in successfully mitigating bias in algorithms of the kind that has long plagued certain demographics within the United States.
Policymakers should recognize that there is no technical solution to address unfairness in clinical predictive models that does not decrease accuracy. Consequently, they should consider ways to incentivize responsible algorithm development alongside policies that address broader, structural healthcare inequities such as those caused by racism and socioeconomic inequality. The use of clinical predictive models must either be narrowly calibrated to a particular setting or constructed so that a human healthcare provider stays in the decision-making loop to ensure fair patient treatment. If machine learning models do not promote health equity, it may be appropriate to abstain from using an algorithm altogether.







