Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.
Sign Up For Latest News

Bloom, a health coaching app created by Stanford researchers, helps people tap into their own motivations.
Bloom, a health coaching app created by Stanford researchers, helps people tap into their own motivations.


Stanford scholars respond to a federal RFC on evaluating AI-enabled medical devices, recommending policy interventions to help mitigate the harms of AI-powered chatbots used as therapists.
Stanford scholars respond to a federal RFC on evaluating AI-enabled medical devices, recommending policy interventions to help mitigate the harms of AI-powered chatbots used as therapists.




Practicing specific social skills with AI chatbots helps users build confidence and competence.
Practicing specific social skills with AI chatbots helps users build confidence and competence.

Artificial intelligence (AI) tools for radiology are commonly unmonitored once deployed. The lack of real-time case-by-case assessments of AI prediction confidence requires users to independently distinguish between trustworthy and unreliable AI predictions, which increases cognitive burden, reduces productivity, and potentially leads to misdiagnoses. To address these challenges, we introduce Ensembled Monitoring Model (EMM), a framework inspired by clinical consensus practices using multiple expert reviews. Designed specifically for black-box commercial AI products, EMM operates independently without requiring access to internal AI components or intermediate outputs, while still providing robust confidence measurements. Using intracranial hemorrhage detection as our test case on a large, diverse dataset of 2919 studies, we demonstrate that EMM can successfully categorize confidence in the AI-generated prediction, suggest appropriate actions, and help physicians recognize low confidence scenarios, ultimately reducing cognitive burden. Importantly, we provide key technical considerations and best practices for successfully translating EMM into clinical settings.
Artificial intelligence (AI) tools for radiology are commonly unmonitored once deployed. The lack of real-time case-by-case assessments of AI prediction confidence requires users to independently distinguish between trustworthy and unreliable AI predictions, which increases cognitive burden, reduces productivity, and potentially leads to misdiagnoses. To address these challenges, we introduce Ensembled Monitoring Model (EMM), a framework inspired by clinical consensus practices using multiple expert reviews. Designed specifically for black-box commercial AI products, EMM operates independently without requiring access to internal AI components or intermediate outputs, while still providing robust confidence measurements. Using intracranial hemorrhage detection as our test case on a large, diverse dataset of 2919 studies, we demonstrate that EMM can successfully categorize confidence in the AI-generated prediction, suggest appropriate actions, and help physicians recognize low confidence scenarios, ultimately reducing cognitive burden. Importantly, we provide key technical considerations and best practices for successfully translating EMM into clinical settings.