Is that spot on the X-ray or CT scan something to worry about? It is a question that, thankfully, most people never have to face. But for those who do, the answer can be life-altering.
Increasingly, doctors are calling on artificial intelligence to help diagnose conditions ranging from cancer and heart attack to sepsis and traumatic brain injuries. While AI can sometimes spot concerns that a human might miss, it’s not perfect. Artificial diagnostic aids sometimes recommend unnecessary invasive procedures or miss something that should have been flagged.
These two kinds of mistakes have very different consequences. Deciding whether a scan requires further investigation or not means choosing between risks: the risk of doing an unnecessary procedure and the risk of missing a serious condition. Different patients weight these risks differently. Currently, however, the patient’s own values and preferences about how to weight these risks are not always part of the AI decision-making calculation.
This ethical conundrum raises profound questions for doctors and AI programmers alike, says ethicist and Stanford Institute for Human-Centered AI fellow Kathleen Creel. With a timely commentary in the journal Nature Medicine, Creel and a multidisciplinary team of co-authors with expertise in radiology, philosophy, and AI say the resolution to that dilemma is clear—AI should put the values of the patient first.
Read the full commentary: "Clinical Decisions Using AI Must Consider Patient Values"
“In a clinical setting, there is no one-size-fits-all approach to diagnostics. AI, as a field, should accommodate this reality by being flexible to the patient’s personal perspectives on risk,” Creel says.
In a borderline case, should a doctor tell the patient there is a concern and risk an unnecessary, potentially invasive procedure that could turn out to be nothing? (A false positive result.) Or should that same doctor, knowing the patient’s preference is to avoid an unnecessary procedure at all costs, tell the patient there is nothing to worry about when, in fact, there could be plenty to worry about? (A false negative result.)
Risk-averse patients prefer to do anything they can to avoid a false negative. Others really don’t want to undergo an avoidable surgery. “That's their priority—keep me out of the hospital,” Creel says. “AI designers must build in sensitivity and flexibility to address both types of patients equally and fairly.”
Playing the Percentages
AI in medical devices typically calculates a probability—the likelihood that a spot in a scan is cancer or some other disease—and then makes a recommendation to the doctor whether to investigate further.
There are three general approaches to using AI in these circumstances. The first is the status quo: The algorithm calculates the probability of concern and if it exceeds a threshold—say better than 80 percent likelihood of cancer—the patient is automatically recommended for follow-up. This approach relies exclusively on AI without human input. The doctor knows only AI’s recommendation, not the probability it used to reach it or the threshold, which is often set by programmers.
In a second approach, AI directly incorporates a patient’s values and attitude toward risk and uses this information to set a personalized threshold as to whether to proceed or not. This approach incorporates patient values, but not the doctor’s clinical judgment.
In the third approach, AI not only provides the recommendation but also tells the doctor the probability of disease. It is then up to the doctor’s expertise and knowledge of the patient’s wishes whether to recommend follow-up.
Creel and her co-authors express concerns about the first two approaches, recommending the third. The first unacceptably ignores patient values and wishes. Both the first and second approaches leave the doctor out of the decision, which patient focus groups at both Stanford and Washington University rejected. Instead, patients preferred variants of the third approach, in which doctors incorporate patient values and AI’s percentage to set a personalized threshold for each patient.
Tuning in to Patient Concerns
Creel and colleagues argue that thresholds for converting a probability into a medical decision should be based on specific patient values. Patients should take a brief, pre-examination survey that probes their reactions to hypothetical outcomes to learn about their attitudes toward over- and under-diagnosis, their worries about false-positive and false-negative results, concerns about over- and under-treatment, and quality-of-life issues should they require treatment.
For instance, the questionnaire might ask a patient to respond to statements like: “I would rather risk surgical complications to treat a benign tumor than risk missing a cancerous tumor.”
It is not unlike tuning a radio—thresholds’ varying degrees could be dialed to a patient’s relative risk-tolerance score. An algorithm might be tuned to a threshold of, say, 90 percent or higher for a treatment-averse patient. For a risk-averse patient whose greatest fear is a false negative, the threshold might be adjusted lower to flag concerns more often.
“Patients deserve to have their values reflected in this debate and in the algorithms,” Creel says. “Adding a degree of patient advocacy would be a positive step in the evolution of AI in medical diagnostics.”
Stanford HAI's mission is to advance AI research, education, policy and practice to improve the human condition. Learn more.