Multimodal AI refers to artificial intelligence systems that can process, understand, and generate multiple types of data modalities simultaneously—such as text, images, audio, and video. Unlike traditional AI models that work with only one type of input, multimodal systems can combine information from different sources to gain richer understanding and produce more comprehensive outputs.
Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.
Sign Up For Latest News
Explore Similar Terms:
Computer Vision | Natural Language Processing (NLP) | Transformer

The MUSK model combines clinical notes and images to predict prognosis and immunotherapy response.
The MUSK model combines clinical notes and images to predict prognosis and immunotherapy response.


Thirty-two interdisciplinary teams will receive $2.37 million in Seed Research Grants to work toward initial results on ambitious proposals.
Thirty-two interdisciplinary teams will receive $2.37 million in Seed Research Grants to work toward initial results on ambitious proposals.


Five teams will use the funding to advance their work in biology, generative AI and creativity, policing, and more.
Five teams will use the funding to advance their work in biology, generative AI and creativity, policing, and more.


Leading Stanford faculty offer their expectations for artificial intelligence in the new year.
Leading Stanford faculty offer their expectations for artificial intelligence in the new year.


The new AI Index spots major advances in multimodal models, robotics, generative AI, and more.
The new AI Index spots major advances in multimodal models, robotics, generative AI, and more.


Researchers are establishing standards to validate the efficacy of AI agents in clinical settings.
Researchers are establishing standards to validate the efficacy of AI agents in clinical settings.
