Toward Stronger FDA Approval Standards for AI Medical Devices

This brief examines the FDA’s medical AI device approval process and urges policymakers to close the gaps created by the growth of AI-enabled healthcare.
Executive Summary
As the development and adoption of artificial intelligence-enabled healthcare tools continue to accelerate, regulators and researchers are beginning to confront oversight concerns in the clinical evaluation process that could yield negative consequences on patient health if left unchecked. Since January 2015, the United States Food and Drug Administration (FDA) has evaluated and granted clearance for over 100 AI-based medical devices using a fairly rudimentary evaluation process that is in dire need of improvement as these evaluations have not been adapted to address the unique concerns surrounding AI. In fact, the FDA itself recently called for improving the quality of the evaluation data, increasing trust and transparency between developers and users, monitoring algorithmic performance and bias on the intended population, and testing with clinicians in the loop. Although academics are starting to develop new reporting guidelines for clinical trials, there is currently a lack of established best practices for evaluating commercially available AI medical devices to ensure their reliability and safety.
In the paper titled “How Medical AI Devices Are Evaluated: Limitations and Recommendations from an Analysis of FDA Approvals,” we examined the evaluation process performed on 130 FDA-approved AI medical devices between January 2015 and December 2020. The shortcomings were significant: 97% performed only retrospective evaluations that are much less credible; 72% did not publicly report whether the algorithm was tested on more than one site; and 45% didn’t report basics, like sample size. We show performance degradation—and potential demographic bias—when algorithms are tested on only a single site with a model designed to detect collapsed lungs in chest X-rays.
The findings from our research ultimately led us to the following three policy recommendations:
Ensure future FDA-approved AI devices undergo multisite evaluations.
Encourage more prospective studies—i.e., those in which the test data is collected and evaluated concurrently with device deployment—that include a comparison to current standards of care without AI.
Mandate post-market surveillance of medical AI devices to better understand some of the unintended outcomes and biases not detected in the evaluation process.







