Skip to main content Skip to secondary navigation
Page Content

Stanford’s Multimodal AI Model Advances Personalized Cancer Care

The MUSK model combines clinical notes and images to predict prognosis and immunotherapy response.

Image
A doctor meets with a patient to discuss his prognosis.

iStock

A human pathologist is still the gold standard for diagnosing diseases today. Unlike current artificial intelligence models, doctors don’t rely on a single data source to make clinical decisions and instead factor in a patient’s demographics, their medical history, imaging, and other characteristics of a disease. That is why AI has largely remained an assistive tool for doctors.

But now, researchers at Stanford have built a more useful AI model that factors in clinical notes and images to help predict patient outcomes and determine what treatment might work best. The new AI model – nicknamed MUSK for Multimodal transformer with Unified maSKed modeling – can look at unlabeled and unpaired image-text data at a large scale. The research, partially funded by the Stanford Institute for Human-Centered AI, was published in the journal Nature in early January.

Instead of relying on analyzing a single data source in isolation, MUSK is multimodal; it consults clinical notes and images that humans haven’t had to manually pair.

“We try to extract concrete, complementary information from both modalities so that we can make a good clinical decision that cannot be achieved by a single modality,” said Jinxi Xiang, lead author of the study and a Stanford postdoctoral scholar in radiation oncology.

Doctors often need the most help with predicting outcomes and with precision in cancer therapies. Empowered with images and text, MUSK can better predict how a patient might respond to certain types of cancer treatment.

To develop that capability, the researchers pretrained MUSK using 50 million pathology images and 1 billion pathology-related text tokens (or commonly grouped words and characters), representing 33 tumor types. That scale of training is a dramatic increase from image-text pairs in existing studies.

“Compared to the traditional AI approach, you can leverage unlabeled, large-scale, diverse data, so you don’t have to ask human experts to label them,” said Ruijiang Li, Stanford associate professor (research) of radiation oncology, whose lab focuses on applying machine and deep learning to medical imaging analysis and precision oncology. “Now we have designed this new architecture that can take in unpaired multimodal data sets for pretraining, so you are able to leverage a much larger data set to train more robust models.”

So far, the Stanford team has tested MUSK with multimodal data from more than 8,000 patients. The test results are showing promise over existing models. The breakthrough of blending images and clinical reports more accurately predicts outcomes like melanoma relapse and how a patient might respond to immunotherapy with lung and gastro-esophageal cancers. They also found the model did well with predicting prognosis for 16 cancer types, especially for common cancers like breast, lung, and colorectal cancers.

Moving forward, the MUSK approach to digital pathology could be generalized to other types of medical data, as well as biological data. But for now, Li said the team needs to gather more evidence for MUSK in order to eventually deploy it in the clinical setting. It also would need to undergo a clinical trial and then get regulatory approval.

“It’s a major step forward and valuable contribution to the field of multimodal foundation models,” Li added.

More News Topics