Could Self-Supervised Learning Be a Game-Changer for Medical Image Classification?
To keep up with their workloads, radiologists must read a medical image approximately every three to four seconds.
That’s an unsustainable pace that could increase the potential for mistakes.
Many believe that AI tools could help radiologists by accurately flagging abnormal images more quickly than the human eye. But the typical “supervised” approach to training medical image classifiers requires huge volumes of images that must be labeled by the very radiologists who are overwhelmed with work.
And even if a research team gathers the resources to label medical images for one health problem, such as pneumonia, they still need to do it all over again for pneumothorax, pulmonary embolism, lung cancer, and so on.
“Using supervised methods to train medical image models is unscalable,” says Shih-Cheng (Mars) Huang, a fifth-year graduate student in biomedical informatics at Stanford University.
An alternative is to use lots of unlabeled data to pre-train models in a self-supervised way, Huang says. The model then learns what medical images look like generally, which dramatically reduces the number of labeled images needed to train the model to learn how to spot different diseases down the line.
In a new review paper, Huang and his colleagues explore the various types of self-supervised approaches currently in use. They find that self-supervised learning outperforms supervised learning in the medical image classification arena. And while it’s still unclear which self-supervised strategies work best in different contexts, the most effective models combine several approaches.
Read the full paper, Self-supervised Learning for Medical Image Classification: A Systematic Review and Implementation Guidelines
The review also suggests that big gains could be made by training self-supervised models using not only medical images but their associated clinical data, including electronic health records and radiology reports. Early indications are that this approach could lead to generalized models capable of distinguishing between signs of many disease conditions rather than just one or a few.
“This type of self-supervised training using multi-modal data may be a step on the path to a true foundation model for medical image classification,” Huang says.
The Review
In their paper, Huang and his colleagues performed a systematic review of 79 peer-reviewed studies in which one or a combination of four different self-supervised methods had been used to train a medical image classifier to detect diseases of various types. Five studies reported a slight decrease in performance over a supervised learning strategy, but the large majority reported significantly increased performance. In the 30 studies where researchers compared different self-supervised strategies doing the same task, combined approaches proved better than any individual self-supervised strategy.
At this point, Huang says, researchers should definitely be using self-supervision to pretrain medical image classifiers. But it’s still not clear which approach or combination of approaches will prove most effective, he says. “Going forward, researchers should try out different types of self-supervised methods to see which ones work for a particular modality or particular clinical implication.”
Medical vs. Natural Images: Be Aware of the Difference
When it comes to self-supervised learning for medical image classification, it’s important for researchers to keep in mind two ways that medical images differ from so-called “natural images,” Huang says. One key difference creates a hurdle for this work, while the other offers a significant advantage.
First, the hurdle: Unlike everyday images (cats, dogs, buses, cars, and so on), most medical images look alike, Huang says. “Distinguishing an unhealthy patient from a healthy patient depends on subtle image cues that are very localized and sometimes hard to detect.” This makes some of the key steps in self-supervised learning potentially problematic for medical images. For example, many of these methods use masking or cropping to remove parts of an image so that the model will learn to predict the missing portion. But doing that with radiology images could undermine the learning process if disease-related abnormalities are cropped or masked. In the review paper, Huang and his colleagues describe this conundrum and encourage researchers to keep it in mind as they train their models.
Second, the advantage: Unlike many natural images, medical images are often accompanied by abundant metadata, including not only associated clinical health records containing patient demographics and health conditions but also the radiology reports that are almost as good as a labeled dataset. Just as physicians often rely on this information when interpreting medical images, so too should AI models be trained to learn from all these data sources — and they can do so in a self-supervised way, Huang notes.
For his dissertation, Huang hopes to take advantage of these data sources to work toward a foundation model for medical image classification. Ideally, such a model would not need a fine-tuning supervised step at all, Huang says. It would be able to differentiate between and identify all sorts of disease conditions in a variety of types of medical images from the self-supervised pre-training step alone.
“We’re not there yet,” Huang says. “But applying this method at scale might move the needle.”
Stanford HAI’s mission is to advance AI research, education, policy and practice to improve the human condition. Learn more.