Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.
Sign Up For Latest News

Medical and AI experts build a benchmark for evaluation of LLMs grounded in real-world healthcare needs.
Medical and AI experts build a benchmark for evaluation of LLMs grounded in real-world healthcare needs.


This brief introduces Holistic Evaluation of Language Models (HELM) as a framework to evaluate commercial application of AI use cases.
This brief introduces Holistic Evaluation of Language Models (HELM) as a framework to evaluate commercial application of AI use cases.

Stanford HAI researchers develop a new benchmark suite aimed to test difference awareness in AI models.
Stanford HAI researchers develop a new benchmark suite aimed to test difference awareness in AI models.

This explainer provides brief definitions for key terms associated with artificial intelligence, ranging from autonomous systems to deep learning and foundation models.
This explainer provides brief definitions for key terms associated with artificial intelligence, ranging from autonomous systems to deep learning and foundation models.

A team of researchers from Stanford HAI, MIT, and Princeton created the Foundation Model Transparency Index, which rated the transparency of 10 AI companies; each one received a failing grade.
A team of researchers from Stanford HAI, MIT, and Princeton created the Foundation Model Transparency Index, which rated the transparency of 10 AI companies; each one received a failing grade.
-as-a-glowing-digital-entity-on-a-computer-screen.-the-llm-is-surrounded-by-a-few-heart-copy.jpg&w=256&q=80)
When LLMs take surveys on personality traits, they, like people, exhibit a desire to appear likable.
When LLMs take surveys on personality traits, they, like people, exhibit a desire to appear likable.
-as-a-glowing-digital-entity-on-a-computer-screen.-the-llm-is-surrounded-by-a-few-heart-copy.jpg&w=256&q=100)