Skip to main content Skip to secondary navigation
Page Content

How Can We Better Regulate Health AI?

HAI Associate Director Curt Langlotz explains the current state of health regulation and where we need to move to protect patients and better assist doctors.

Image
Curt Langlotz

Christine Baker

Curt Langlotz, Stanford HAI associate director, speaking at a recent conference.

This May, the Stanford Institute for Human-Centered AI brought together over 50 policymakers, academics, healthcare providers, AI developers, and patient advocates in a closed-door workshop to tackle the regulatory challenges introduced by the rapid integration of AI into the healthcare industry. The day’s conversations included analyzing the gaps in AI device regulation and learning about new AI applications, both patient-facing and in administration and operations.

Read a related story, Pathways to Governing AI Technologies in Healthcare

 

Curt Langlotz, Stanford HAI associate director and professor of radiology, medicine, and biomedical data science, led the day’s conversation. Here he offers some high-level gleanings from the day’s conversations, and how the regulation field must change to benefit patient, doctor, and developer.

What do regulators need to know about AI?

The Food and Drug Administration (FDA) is our primary federal regulator of clinical AI. They already know a lot about AI and are doing a fine job under difficult constraints, balancing safety and innovation using a 50-year-old regulatory regime designed at the time of paper records and fax machines. 

I would emphasize the challenges faced by the potential purchasers of AI algorithms right now. In my specialty, radiology, there are over 600 FDA-cleared algorithms, and over 100 companies selling AI products to radiologists. We know that these algorithms don’t generalize well to new populations. So many potential customers are having a difficult time determining whether a given AI product will work in their practice. We need more transparency about the data on which these products were trained. The FDA is making good progress on that, working in part with an international consortium. In the meeting, we discussed the advantages of model cards and data sheets, and made an analogy to “Table 1” of prospective clinical trial publications, which detail the characteristics of the patients on whom a new drug or device is tested. 

What do AI developers need to better understand about regulation?

Developers have a tendency to think of regulations as a problem to overcome. But in many ways, we are fortunate that health AI is already a regulated industry with a neutral party ensuring that we build safe and effective systems. Lately, we have seen in other industries how a lack of standards can undermine public trust in AI. 

We also should do more to eliminate the wasted effort that occurs when developers aren’t aware of the rigorous evaluations that regulators expect. If we applied the required rigor from the start, we would avoid the need to re-run experiments later.

Where are the biggest opportunities right now?

Many of the currently available AI algorithms are designed to detect things, whether it’s sepsis, a brain hemorrhage, a lung nodule, or something else. These systems, while marginally helpful, also create extra work for the user, not only to chase down false positives, but also to follow up on the true positives. So healthcare workers who use these algorithms don’t always see them as beneficial.

We are finally seeing a shift toward algorithms that improve efficiency or have a clear return on investment. For example, an algorithm that can draft a clinic note or a radiology report could provide substantial time savings for the user. And an algorithm that extracts new information from images, like finding patients with unsuspected coronary artery disease or osteoporosis on a routine CT, can not only improve outcomes but also yield financial benefits to the healthcare organization.

Another opportunity is using large language models to engage patients in their care. My lab has designed a system to help patients understand their imaging test results. The patient receives a radiology report with the complex medical terms hyperlinked. If they don’t understand something, they can click the link and get a simple, clear explanation from a chatbot.

Where are the biggest pitfalls?

I think the role of large language models in medicine is overhyped right now. A spate of recent papers have received attention for showing that these models can pass medical certifying exams and solve abstract clinical problems. I don’t see a clear path to regulatory approval to use these large models in that way. The hallucination issue will be difficult to shake. And because the training data is essentially the entire internet, the system has a skewed view of the probability of disease, based on what gets attention on the internet.

Instead we can build models trained only on large amounts of high-quality medical data for use in pre-training of specialized downstream models. I don’t need a generalist model that can recommend restaurants and do many other things that are useful but are wasted effort in medicine. Instead, I want a model that can provide high-quality assistance with medical decisions.

Another pitfall is how much we still have to learn about the best way for humans to interact with these systems. We know that the combination of human and machine is likely to be better than either one alone. But the wrong human-machine interaction, often due to poor system design, can lead to bad outcomes. There has been some great work on how systems can better explain their results, which is important for trustworthy systems. But there is so much more we need to do to design the optimal human-AI interaction.

What surprised you from the day’s conversation?

I was pleasantly surprised by how engaged our government colleagues were. Often regulators are reluctant to share much about their processes. But we had many frank discussions about the concerns they face. I had the sense that our regulators and policymakers are receptive to change. In particular, there was a sense that there should be an increased emphasis on post-market surveillance, rather than pre-market clearance. There has been good progress in that area as well. I think that is a healthy dynamic.

What are the next steps for you? For this group of attendees?

We plan to write up our conclusions and recommendations in a series of policy briefs. The meeting was structured around three use cases: clinical decision support, such as an AI algorithm that detects abnormalities on images; enterprise AI, such as applications that can listen to a doctor-patient interaction and draft a note; and consumer-facing AI, such as mental health chatbots. We will have some recommendations for policymakers in each of these areas.

Stanford HAI’s mission is to advance AI research, education, policy and practice to improve the human condition. Learn more

More News Topics