Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
AI’s Ostensible Emergent Abilities Are a Mirage | Stanford HAI

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
news

AI’s Ostensible Emergent Abilities Are a Mirage

Date
May 08, 2023
Topics
Natural Language Processing
Machine Learning
iStock/Will Petri

According to Stanford researchers, large language models are not greater than the sum of their parts.

For a few years now, tech leaders have been touting AI’s supposed emergent abilities: the possibility that beyond a certain threshold of complexity, large language models (LLMs) are doing unpredictable things. If we can harness that capacity, AI might be able to solve some of humanity’s biggest problems, the story goes. But unpredictability is also scary: Could making a model bigger unleash a completely unpredictable and potentially malevolent actor into the world?

That concern is widely shared by many in the tech industry. Indeed, a recently publicized open letter signed by more than 1,000 tech leaders calls for a six-month pause on giant AI tech experiments as a way to step back from “the dangerous race to ever-larger unpredictable black-box models with emergent capabilities.”

But according to a new paper, we can perhaps put that particular concern about AI to bed, says lead author Rylan Schaeffer, a second-year graduate student in computer science at Stanford University. “With bigger models, you get better performance,” he says, “but we don’t have evidence to suggest that the whole is greater than the sum of its parts.”

Indeed, as he and his colleagues Brando Miranda, a Stanford PhD student, and Sanmi Koyejo, an assistant professor of computer science, show, the perception of AI’s emergent abilities is based on the metrics that have been used. “The mirage of emergent abilities only exists because of the programmers' choice of metric,” Schaeffer says. “Once you investigate by changing the metrics, the mirage disappears.”

Finding the Mirage

Schaeffer began wondering if AI’s alleged emergent abilities were real while attending a lecture describing them. “I noticed in the lecture that many claimed emergent abilities seemingly appeared when researchers used certain very specific ways of evaluating those models,” he says.

Specifically, these metrics more harshly evaluated the performance of smaller models, making it appear as if novel and unpredictable abilities are arising as the models get bigger. Indeed, graphs of these metrics display a sharp change in performance at a particular model size – which is why emergent properties are sometimes called “sharp left turns.”

But when Schaeffer and his colleagues used other metrics that measured the abilities of smaller and larger models more fairly, the leap attributed to emergent properties was gone. In the paper published April 28 on preprint service arXiv, Schaeffer and his colleagues looked at 29 different metrics for evaluating model performance. Twenty-five of them show no emergent properties. Instead, they reveal a continuous, linear growth in model abilities as model size grows.

And there are simple explanations for why the other four metrics incorrectly suggest the existence of emergent properties. “They’re all sharp, deforming, non-continuous metrics,” Schaeffer says. “They are very harsh judges.” Indeed, using the metric known as “exact string match,” even a simple math problem will appear to develop emergent abilities at scale, Schaeffer says. For example, imagine doing an addition problem and making an error that’s off by one digit. The exact string match metric will view that mistake as being just as bad as an error that’s off by one billion digits. The result: a disregard for the ways that small models gradually improve as they scale up, and the appearance that large models make great leaps ahead. 

Schaeffer and his colleagues had also noticed that no one has claimed that large vision models exhibit emergent properties. As it turns out, vision researchers don’t use the harsh metrics used by natural language researchers. When Schaeffer applied the harsh metrics to a vision model, voilà, the mirage of emergence appeared.

Artificial General Intelligence Will Be Foreseeable

This is the first time an in-depth analysis has shown that the highly publicized story of LLMs’ emergent abilities springs from the use of harsh metrics. But it’s not the first time anyone has hinted at that possibility. Google’s recent paper “Beyond the Imitation Game” suggested that metrics might be the issue. And after Schaeffer’s paper came out, a research scientist working on LLMs at OpenAI tweeted that the company has made similar observations. 

What it means for the future is this: We don’t need to worry about accidentally stumbling onto artificial general intelligence (AGI). Yes, AGI may still have huge consequences for human society, Schaeffer says, “but if it emerges, we should be able to see it coming.”

Stanford HAI’s mission is to advance AI research, education, policy and practice to improve the human condition. Learn more. 

iStock/Will Petri
Share
Link copied to clipboard!
Contributor(s)
Katharine Miller

Related News

Fei-Fei Li Wins Queen Elizabeth Prize for Engineering
Shana Lynch
Nov 07, 2025
News

The Stanford HAI co-founder is recognized for breakthroughs that propelled computer vision and deep learning, and for championing human-centered AI and industry innovation.

News

Fei-Fei Li Wins Queen Elizabeth Prize for Engineering

Shana Lynch
Computer VisionMachine LearningNov 07

The Stanford HAI co-founder is recognized for breakthroughs that propelled computer vision and deep learning, and for championing human-centered AI and industry innovation.

Offline “Studying” Shrinks the Cost of Contextually Aware AI
Andrew Myers
Sep 29, 2025
News
Blue abstract background with light traveling through abstract flat cable illustrating data flow (3D render)

By having AI study a user’s context offline, researchers dramatically reduce the memory and cost required to make AI contextually aware.

News
Blue abstract background with light traveling through abstract flat cable illustrating data flow (3D render)

Offline “Studying” Shrinks the Cost of Contextually Aware AI

Andrew Myers
Foundation ModelsMachine LearningSep 29

By having AI study a user’s context offline, researchers dramatically reduce the memory and cost required to make AI contextually aware.

BEHAVIOR Challenge Charts the Way Forward for Domestic Robotics
Andrew Myers
Sep 22, 2025
News

With a first-of-its-kind competition for roboticists everywhere, researchers at Stanford are hoping to push domestic robotics into a new age of autonomy and capability.

News

BEHAVIOR Challenge Charts the Way Forward for Domestic Robotics

Andrew Myers
RoboticsMachine LearningSep 22

With a first-of-its-kind competition for roboticists everywhere, researchers at Stanford are hoping to push domestic robotics into a new age of autonomy and capability.