Operationalizing Real-Time Monitoring of Clinical AI

Date

May 14, 2026

Topics

Healthcare

Regulation, Policy, Governance

Read Paper

abstract

This brief demonstrates how real-time monitoring can address critical gaps in the oversight of radiological AI tools.

Key Takeaways

Radiological AI tools account for the largest share of FDA-approved healthcare AI, yet clinical adoption remains slow and most deployed systems lack robust performance monitoring.
We introduce the Ensemble Monitoring Model (EMM) — a framework that assesses uncertainty in the predictions of radiology AI models trained to detect abnormalities (in this case, brain bleeds), thereby providing clinicians with actionable signals at the point of care and enabling real-time monitoring of AI tool performance after clinical adoption.
EMM addresses an urgent gap by offering a practical, customizable method for signaling when confidence is low in real time, diagnosing failure modes, and supporting retraining of clinical AI when needed.
Policymakers should treat continuous performance monitoring as a core component of responsible AI deployment in healthcare and consider requiring healthcare AI vendors to put in place post-deployment monitoring mechanisms.

Executive Summary

AI tools are increasingly used in radiology, with the specialty accounting for approximately 76% of all FDA-authorized AI-enabled medical devices as of December 2025. A variety of tools can detect anomalies in X-rays or CT scans and provide diagnostic support. Yet many of these AI systems are deployed with limited mechanisms for monitoring and evaluating their performance, leaving clinicians to determine on their own which AI outputs are reliable. Without effective post-deployment oversight, these tools risk contributing to diagnostic errors and missed findings.

In our paper “Automated real-time assessment of intracranial hemorrhage detection AI using an ensemble monitoring model (EMM),” we introduce a new framework to enable real-time monitoring of AI radiology tool performance after deployment. Inspired by clinical consensus practices, the Ensemble Monitoring Model (EMM) measures agreement between a primary AI model and an ensemble of five independent submodels to estimate uncertainty without requiring access to black box model components. Using a large dataset focused on the detection of brain bleeds, we demonstrate that EMM can reduce radiologists’ cognitive burden by effectively characterizing AI model uncertainty in real time at the point of care — when radiologists review both the images and the corresponding AI output — and guiding appropriate responses when cases are flagged for reduced accuracy.

The growing reliance on AI in radiology and healthcare more broadly highlights that effective governance cannot stop at product approval. There is a critical need for total lifecycle management that ensures AI tools remain safe, accurate, and reliable after they are deployed in clinical settings. EMM enables AI models to be continually optimized and monitored after deployment. Policymakers should view methods like EMM as an important component of a broader regulatory strategy to ensure that AI in healthcare delivers measurable benefits without introducing new and unmanaged risks.

Introduction

Despite an exponential increase in FDA-cleared radiological AI tools over the last decade, clinical adoption has been slow. These tools promise to enhance clinical efficiency — for example, by supporting radiology tasks that involve detecting anomalies in medical images and classifying or prioritizing different cases. Yet their adoption has also been accompanied by safety concerns, including a potential increase in misdiagnosis caused, for example, by cognitive pitfalls such as automation or confirmation bias. As a result, clinicians often have to meticulously verify each AI result.

Evidence shows that clinicians are strongly influenced by how certain an AI model claims to be about its predictions. When a system provides clear confidence information, physicians are more likely to incorporate the output into their decision-making. When no measure of certainty is available, clinicians are left to rely only on their own judgment and tend to trust the model far less.

Today, most monitoring of radiology AI systems still relies on retrospective, labor-intensive reviews of a small amount of manually labeled data, which provide only a partial view of real-world performance. To address this problem, researchers have developed a range of real-time monitoring techniques for estimating model confidence that use the same dataset the AI system was trained on to monitor it. Other methods approximate predictive reliability through the use of “deep ensembles,” i.e., a collection of multiple smaller, independent models that stem from the same model architecture but are each trained from a different random starting point, causing them to learn in subtly different ways.

While these techniques can be effective in research settings, they share a major practical limitation: Nearly all of them require access to internal model components such as training datasets, model weights, or intermediate outputs. For commercial AI products, which are typically deployed as closed, black box systems, this approach is largely unfeasible, leaving healthcare providers and policymakers without the means to oversee clinical adoption.

There is a need for real-time monitoring systems that can automatically characterize model confidence at the point of care without requiring access to internal model details. While measuring prediction uncertainty represents only one dimension of AI oversight — model performance can also be undermined by factors such as flawed input data, poor image quality, or improper image presentation — it remains a particularly important and substantive component of effective post-deployment evaluation.

Read Paper

Related Publications

The World Model and Spatial Intelligence Era: Governing AI Beyond Language

Daniel Zhang, Russell Wald, Ehsan Adeli, Elena Cryst, Daniel E. Ho, Caroline Meinhardt, Jiajun Wu, Amy Zegart, Fei-Fei Li

Quick ReadJul 27, 2026

Issue Brief

This brief highlights the emergence of world models and outlines a first-of-its-kind governance and policy agenda for the technology.

Issue Brief

The World Model and Spatial Intelligence Era: Governing AI Beyond Language

Daniel Zhang, Russell Wald, Ehsan Adeli, Elena Cryst, Daniel E. Ho, Caroline Meinhardt, Jiajun Wu, Amy Zegart, Fei-Fei Li

Foundation ModelsRegulation, Policy, GovernanceIndustry, InnovationSpatial IntelligenceQuick ReadJul 27

This brief highlights the emergence of world models and outlines a first-of-its-kind governance and policy agenda for the technology.

Data Privacy and Foundation Models: Can We Have Both?

Jennifer King, Tiffany Saade

Quick ReadApr 08, 2026

Issue Brief

This brief examines the privacy risks foundation models pose to individuals and society, and governance mechanisms needed to address them.

Issue Brief

Data Privacy and Foundation Models: Can We Have Both?

Jennifer King, Tiffany Saade

Privacy, Safety, SecurityFoundation ModelsRegulation, Policy, GovernanceQuick ReadApr 08

This brief examines the privacy risks foundation models pose to individuals and society, and governance mechanisms needed to address them.

Toward Responsible AI in Health Insurance Decision-Making

Michelle Mello, Artem Trotsyuk, Abdoul Jalil Djiberou Mahamadou, Danton Char

Quick ReadFeb 10, 2026

Policy Brief

This brief proposes governance mechanisms for the growing use of AI in health insurance utilization review.

Policy Brief

Toward Responsible AI in Health Insurance Decision-Making

Michelle Mello, Artem Trotsyuk, Abdoul Jalil Djiberou Mahamadou, Danton Char

HealthcareRegulation, Policy, GovernanceQuick ReadFeb 10

This brief proposes governance mechanisms for the growing use of AI in health insurance utilization review.

Response to OSTP's Request for Information on Accelerating the American Scientific Enterprise

Rishi Bommasani, John Etchemendy, Surya Ganguli, Daniel E. Ho, Guido Imbens, James Landay, Fei-Fei Li, Russell Wald

Quick ReadDec 26, 2025

Response to Request

Stanford scholars respond to a federal RFI on scientific discovery, calling for the government to support a new “team science” academic research model for AI-enabled discovery.

Response to Request

Response to OSTP's Request for Information on Accelerating the American Scientific Enterprise

Rishi Bommasani, John Etchemendy, Surya Ganguli, Daniel E. Ho, Guido Imbens, James Landay, Fei-Fei Li, Russell Wald

Sciences (Social, Health, Biological, Physical)Regulation, Policy, GovernanceQuick ReadDec 26

Stanford scholars respond to a federal RFI on scientific discovery, calling for the government to support a new “team science” academic research model for AI-enabled discovery.

Navigate

Participate

Stay Up To Date

Operationalizing Real-Time Monitoring of Clinical AI

Key Takeaways

Executive Summary

Introduction

Zhongnan Fang

Lina Cheuy

Hye Sun Na

Akshay Chaudhari

David B. Larson

Healthcare AI Policy

Related Publications

The World Model and Spatial Intelligence Era: Governing AI Beyond Language

The World Model and Spatial Intelligence Era: Governing AI Beyond Language

Data Privacy and Foundation Models: Can We Have Both?

Data Privacy and Foundation Models: Can We Have Both?

Toward Responsible AI in Health Insurance Decision-Making

Toward Responsible AI in Health Insurance Decision-Making

Response to OSTP's Request for Information on Accelerating the American Scientific Enterprise

Response to OSTP's Request for Information on Accelerating the American Scientific Enterprise