Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
Ensuring the Fairness of Algorithms that Predict Patient Disease Risk | Stanford HAI

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
    • Subscribe to Email
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
news

Ensuring the Fairness of Algorithms that Predict Patient Disease Risk

Date
July 27, 2022
Topics
Healthcare

Decision-support tools for helping physicians follow clinical guidelines are increasingly using artificial intelligence, highlighting the need to remove bias from underlying algorithms.

"To treat or not to treat?" is the question continually faced by clinicians. To help with their decision making, some turn to disease risk prediction models. These models forecast which patients are more or less likely to develop disease and thus could benefit from treatment, based on demographic factors and medical data.

With the growth of these tools across the medical field and especially in this area of clinical guidance, researchers at Stanford and elsewhere are grappling with how to ensure the fairness of the models' underlying algorithms. Bias has emerged as a significant problem when models are not developed using data reflecting diverse populations.

In a new study, Stanford researchers examined important clinical guidelines for cardiovascular health that advise the use of a risk calculator to guide prescription decisions for Black women, white women, Black men, and white men. The researchers looked at two ways that have been proposed for improving the fairness of the calculator’s algorithms. One approach, known as group recalibration, re-adjusts the risk model for each subgroup of patients to better match frequency of observed outcomes. The second approach, called equalized odds, seeks to ensure that error rates are similar for all groups. The researchers found that the recalibration approach overall produced the better match with the guidelines' recommendations.

The findings underscore the importance of building algorithms that take into account the full context relevant to the populations they serve.

"While machine learning has a lot of promise in medical settings and other social contexts, there is the potential for these technologies to worsen existing health inequities," says Agata Foryciarz, a Stanford PhD student in computer science and lead author of the study published in BMJ Health & Care Informatics. "Our results suggest that evaluating disease risk prediction models for fairness can make their use more responsible."

In addition to Foryciarz, the researchers include senior author Nigam Shah, Chief Data Scientist for Stanford Health Care and a Stanford HAI faculty member; Google Research Scientist Stephen Pfohl, and Google Health Clinical Specialist Birju Patel.

Prudent Prevention

The clinical guidelines evaluated in the study are for the primary prevention of atherosclerotic cardiovascular disease. This condition is caused by fats, cholesterol, and other substances building up as so-called plaques on the walls of arteries. The sticky plaques block blood flow and potentially lead to adverse outcomes including strokes and kidney failure.  

Read the study: "Evaluating algorithmic fairness in the presence of clinical guidelines: the case of atherosclerotic cardiovascular disease risk estimation"

 

The guidelines, put out by the American College of Cardiology and the American Heart Association, provide recommendations for when patients should start medications called statins — drugs that reduce the levels of certain cholesterol that lead to arterial buildup.

The atherosclerotic cardiovascular disease guidelines take into account medical measures including blood pressure, cholesterol levels, diabetes diagnoses, smoking status, and hypertension treatment, along with the demographics of sex, age, and race. Based on these data, the guidelines suggest the use of a calculator that then estimates patients' overall risk of developing cardiovascular disease within 10 years. Patients identified as being at intermediate or high risk of disease are advised to initiate statin treatment. For patients who are instead at borderline or low risk of disease, statin therapy could be unnecessary or unwanted given potential medication side effects.

"If you as a patient are perceived to be higher risk than you actually are, you can be put on a statin that you don't need," says Foryciarz. "Then on the other hand, if you're predicted to be low risk but you really should be on a statin, doctors might fail to put preventive measures in place that could have prevented heart disease later on."

Clinical practice guidelines are increasingly recommending physicians use clinical risk predictions models for various conditions and patient populations. The proliferation of medical-decision support calculators — for instance on phones and other electronics used in clinical settings — means such apps are often right at hand.

"Clinicians are likely to encounter and use more and more of these algorithm-based decision-support tools, so it's important that designers try to ensure the tools are as fair and accurate as possible," says Foryciarz.

Refining Risk Assessment

For their study, Foryciarz and colleagues used a cohort of more than 25,000 patients age 40-79 collected across several large datasets. The researchers compared the patients' actual incidence of atherosclerosis with the predictions made by risk models. As part of these experiments, the researchers built models using the two approaches of group recalibration and equalized odds and then compared the estimates the model's calculators generated with those generated by a simple model calculator with no fairness adjustment.

Recalibrating separately for each of the four subgroups involved running the model for a subset of each subgroup and obtaining a risk score of the actual percentage of patients who developed disease, and then adjusting the underlying model for the broader subgroup. This approach did successfully boost the model’s desired compatibility with the guidelines for those patients at low levels of risk. On the other hand, differences in the error rates between the subgroups overall did emerge, especially at the high-risk end.

The equalized odds approach, in contrast, required building a new predictive model that was constrained to yield equalized error rates across populations. In practice, this approach achieves similar false-positive and false-negative rates across populations. A false positive refers to a patient who was identified as high risk and would be started on a statin, but who did not develop atherosclerotic cardiovascular disease, while a false negative refers to a patient identified as low risk, but who did develop atherosclerotic cardiovascular disease and would likely have benefited from taking a statin.

Going with this equalized odds approach ultimately skewed the decision threshold levels for the various subgroups. Compared with the group recalibration approach, using the calculator built with equalized odds in mind would have led to more under- and over-prescribing of statins and would fail to potentially prevent some of the adverse outcomes.

The gain in accuracy with group recalibration does require additional time and effort to adjust the original model versus leaving the model as-is, though this would be a small price to pay for improved clinical outcomes. An additional caveat is that dividing a population into subgroups does increase the chances of creating too small a sample size to as effectively assess risks within the subgroup, while also lessening the ability to extend the model's predictions to other subgroups.

Overall, algorithm designers and clinicians alike should keep in mind which fairness metrics to use for evaluation and which, if any, to use for model adjustment. They should also understand how a model or calculator is going to be used in practice and how erroneous predictions could lead to clinical decisions that can generate adverse health outcomes down the line. Awareness of potential bias and further development of fairness approaches for algorithms can improve outcomes for all, Foryciarz notes.   

"While it’s not always easy to identify which of possibly many subgroups to focus on, considering some subgroups is better than not considering any," Foryciarz says. "Developing algorithms to serve a diverse population means that the algorithms themselves have to be developed with that diversity in mind."

 

This is part of a healthcare AI series. Read more about:

  • How do we ensure that healthcare AI is useful?

  • Does every model need to be explainable?

  • Do healthcare models need to be generalizable?

  • What should healthcare executives know before they implement an AI tool?

  • Are medical AI tools delivering on what they promise?

  • Does deidentification of medical records protect our privacy? 

  • How do we make sure healthcare algorithms are fair?

Stanford HAI’s mission is to advance AI research, education, policy and practice to improve the human condition. Learn more.

Share
Link copied to clipboard!
Contributor(s)
Adam Hadhazy

Related News

AI Reveals How Brain Activity Unfolds Over Time
Andrew Myers
Jan 21, 2026
News
Medical Brain Scans on Multiple Computer Screens. Advanced Neuroimaging Technology Reveals Complex Neural Pathways, Display Showing CT Scan in a Modern Medical Environment

Stanford researchers have developed a deep learning model that transforms overwhelming brain data into clear trajectories, opening new possibilities for understanding thought, emotion, and neurological disease.

News
Medical Brain Scans on Multiple Computer Screens. Advanced Neuroimaging Technology Reveals Complex Neural Pathways, Display Showing CT Scan in a Modern Medical Environment

AI Reveals How Brain Activity Unfolds Over Time

Andrew Myers
HealthcareSciences (Social, Health, Biological, Physical)Jan 21

Stanford researchers have developed a deep learning model that transforms overwhelming brain data into clear trajectories, opening new possibilities for understanding thought, emotion, and neurological disease.

Why 'Zero-Shot' Clinical Predictions Are Risky
Suhana Bedi, Jason Alan Fries, and Nigam H. Shah
Jan 07, 2026
News
Doctor reviews a tablet in the foreground while other doctors and nurses stand over a medical bed in the background

These models generate plausible timelines from historical patterns; without calibration and auditing, their “probabilities” may not reflect reality.

News
Doctor reviews a tablet in the foreground while other doctors and nurses stand over a medical bed in the background

Why 'Zero-Shot' Clinical Predictions Are Risky

Suhana Bedi, Jason Alan Fries, and Nigam H. Shah
HealthcareFoundation ModelsJan 07

These models generate plausible timelines from historical patterns; without calibration and auditing, their “probabilities” may not reflect reality.

Stanford Researchers: AI Reality Check Imminent
Forbes
Dec 23, 2025
Media Mention

Shana Lynch, HAI Head of Content and Associate Director of Communications, pointed out the "'era of AI evangelism is giving way to an era of AI evaluation,'" in her AI predictions piece, where she interviewed several Stanford AI experts on their insights for AI impacts in 2026.

Media Mention
Your browser does not support the video tag.

Stanford Researchers: AI Reality Check Imminent

Forbes
Generative AIEconomy, MarketsHealthcareCommunications, MediaDec 23

Shana Lynch, HAI Head of Content and Associate Director of Communications, pointed out the "'era of AI evangelism is giving way to an era of AI evaluation,'" in her AI predictions piece, where she interviewed several Stanford AI experts on their insights for AI impacts in 2026.