Operationalizing Real-Time Monitoring of Clinical AI | Stanford HAI
Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
    • Subscribe to Email
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
Navigate
  • About
  • Events
  • AI Glossary
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

policyPolicy Brief

Operationalizing Real-Time Monitoring of Clinical AI

Date
May 14, 2026
Topics
Healthcare
Regulation, Policy, Governance
Read Paper
abstract

This brief demonstrates how real-time monitoring can address critical gaps in the oversight of radiological AI tools.

Key Takeaways

  • Radiological AI tools account for the largest share of FDA-approved healthcare AI, yet clinical adoption remains slow and most deployed systems lack robust performance monitoring.

  • We introduce the Ensemble Monitoring Model (EMM) — a framework that assesses uncertainty in the predictions of radiology AI models trained to detect abnormalities (in this case, brain bleeds), thereby providing clinicians with actionable signals at the point of care and enabling real-time monitoring of AI tool performance after clinical adoption.

  • EMM addresses an urgent gap by offering a practical, customizable method for signaling when confidence is low in real time, diagnosing failure modes, and supporting retraining of clinical AI when needed.

  • Policymakers should treat continuous performance monitoring as a core component of responsible AI deployment in healthcare and consider requiring healthcare AI vendors to put in place post-deployment monitoring mechanisms.

Executive Summary

AI tools are increasingly used in radiology, with the specialty accounting for approximately 76% of all FDA-authorized AI-enabled medical devices as of December 2025. A variety of tools can detect anomalies in X-rays or CT scans and provide diagnostic support. Yet many of these AI systems are deployed with limited mechanisms for monitoring and evaluating their performance, leaving clinicians to determine on their own which AI outputs are reliable. Without effective post-deployment oversight, these tools risk contributing to diagnostic errors and missed findings.

In our paper “Automated real-time assessment of intracranial hemorrhage detection AI using an ensemble monitoring model (EMM),” we introduce a new framework to enable real-time monitoring of AI radiology tool performance after deployment. Inspired by clinical consensus practices, the Ensemble Monitoring Model (EMM) measures agreement between a primary AI model and an ensemble of five independent submodels to estimate uncertainty without requiring access to black box model components. Using a large dataset focused on the detection of brain bleeds, we demonstrate that EMM can reduce radiologists’ cognitive burden by effectively characterizing AI model uncertainty in real time at the point of care — when radiologists review both the images and the corresponding AI output — and guiding appropriate responses when cases are flagged for reduced accuracy.

The growing reliance on AI in radiology and healthcare more broadly highlights that effective governance cannot stop at product approval. There is a critical need for total lifecycle management that ensures AI tools remain safe, accurate, and reliable after they are deployed in clinical settings. EMM enables AI models to be continually optimized and monitored after deployment. Policymakers should view methods like EMM as an important component of a broader regulatory strategy to ensure that AI in healthcare delivers measurable benefits without introducing new and unmanaged risks. 

Introduction

Despite an exponential increase in FDA-cleared radiological AI tools over the last decade, clinical adoption has been slow. These tools promise to enhance clinical efficiency — for example, by supporting radiology tasks that involve detecting anomalies in medical images and classifying or prioritizing different cases. Yet their adoption has also been accompanied by safety concerns, including a potential increase in misdiagnosis caused, for example, by cognitive pitfalls such as automation or confirmation bias. As a result, clinicians often have to meticulously verify each AI result.

Evidence shows that clinicians are strongly influenced by how certain an AI model claims to be about its predictions. When a system provides clear confidence information, physicians are more likely to incorporate the output into their decision-making. When no measure of certainty is available, clinicians are left to rely only on their own judgment and tend to trust the model far less. 

Today, most monitoring of radiology AI systems still relies on retrospective, labor-intensive reviews of a small amount of manually labeled data, which provide only a partial view of real-world performance. To address this problem, researchers have developed a range of real-time monitoring techniques for estimating model confidence that use the same dataset the AI system was trained on to monitor it. Other methods approximate predictive reliability through the use of “deep ensembles,” i.e., a collection of multiple smaller, independent models that stem from the same model architecture but are each trained from a different random starting point, causing them to learn in subtly different ways. 

While these techniques can be effective in research settings, they share a major practical limitation: Nearly all of them require access to internal model components such as training datasets, model weights, or intermediate outputs. For commercial AI products, which are typically deployed as closed, black box systems, this approach is largely unfeasible, leaving healthcare providers and policymakers without the means to oversee clinical adoption.

There is a need for real-time monitoring systems that can automatically characterize model confidence at the point of care without requiring access to internal model details. While measuring prediction uncertainty represents only one dimension of AI oversight — model performance can also be undermined by factors such as flawed input data, poor image quality, or improper image presentation — it remains a particularly important and substantive component of effective post-deployment evaluation.

Read Paper
Share
Link copied to clipboard!
Authors
  • Zhongnan Fang
    Zhongnan Fang
  • Lina Cheuy
    Lina Cheuy
  • Hye Sun Na
    Hye Sun Na
  • Akshay Chaudhari
    Akshay Chaudhari
  • David B. Larson
    David B. Larson
Related
  • Healthcare AI Policy

    AI has the potential to transform healthcare delivery yet there is an urgent need for governance processes to guide the safe, fair, and secure adoption of AI in clinical settings. Stanford HAI’s multidisciplinary Healthcare AI Policy Steering Committee conducts research and convenes discussions to develop tangible recommendations for policymakers.

Related Publications

Data Privacy and Foundation Models: Can We Have Both?
Jennifer King, Tiffany Saade
Quick ReadApr 08, 2026
Issue Brief

This brief examines the privacy risks foundation models pose to individuals and society, and governance mechanisms needed to address them.

Issue Brief

Data Privacy and Foundation Models: Can We Have Both?

Jennifer King, Tiffany Saade
Privacy, Safety, SecurityFoundation ModelsRegulation, Policy, GovernanceQuick ReadApr 08

This brief examines the privacy risks foundation models pose to individuals and society, and governance mechanisms needed to address them.

Toward Responsible AI in Health Insurance Decision-Making
Michelle Mello, Artem Trotsyuk, Abdoul Jalil Djiberou Mahamadou, Danton Char
Quick ReadFeb 10, 2026
Policy Brief

This brief proposes governance mechanisms for the growing use of AI in health insurance utilization review.

Policy Brief

Toward Responsible AI in Health Insurance Decision-Making

Michelle Mello, Artem Trotsyuk, Abdoul Jalil Djiberou Mahamadou, Danton Char
HealthcareRegulation, Policy, GovernanceQuick ReadFeb 10

This brief proposes governance mechanisms for the growing use of AI in health insurance utilization review.

Response to OSTP's Request for Information on Accelerating the American Scientific Enterprise
Rishi Bommasani, John Etchemendy, Surya Ganguli, Daniel E. Ho, Guido Imbens, James Landay, Fei-Fei Li, Russell Wald
Quick ReadDec 26, 2025
Response to Request

Stanford scholars respond to a federal RFI on scientific discovery, calling for the government to support a new “team science” academic research model for AI-enabled discovery.

Response to Request

Response to OSTP's Request for Information on Accelerating the American Scientific Enterprise

Rishi Bommasani, John Etchemendy, Surya Ganguli, Daniel E. Ho, Guido Imbens, James Landay, Fei-Fei Li, Russell Wald
Sciences (Social, Health, Biological, Physical)Regulation, Policy, GovernanceQuick ReadDec 26

Stanford scholars respond to a federal RFI on scientific discovery, calling for the government to support a new “team science” academic research model for AI-enabled discovery.

Response to FDA's Request for Comment on AI-Enabled Medical Devices
Desmond C. Ong, Jared Moore, Nicole Martinez-Martin, Caroline Meinhardt, Eric Lin, William Agnew
Quick ReadDec 02, 2025
Response to Request

Stanford scholars respond to a federal RFC on evaluating AI-enabled medical devices, recommending policy interventions to help mitigate the harms of AI-powered chatbots used as therapists.

Response to Request

Response to FDA's Request for Comment on AI-Enabled Medical Devices

Desmond C. Ong, Jared Moore, Nicole Martinez-Martin, Caroline Meinhardt, Eric Lin, William Agnew
HealthcareRegulation, Policy, GovernanceQuick ReadDec 02

Stanford scholars respond to a federal RFC on evaluating AI-enabled medical devices, recommending policy interventions to help mitigate the harms of AI-powered chatbots used as therapists.