New Approach to Scaling Laws Could Change How AI Models Are Trained | Stanford HAI
Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
    • Subscribe to Email
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
Navigate
  • About
  • Events
  • AI Glossary
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

news

New Approach to Scaling Laws Could Change How AI Models Are Trained

Date
May 21, 2026
Topics
Natural Language Processing
Generative AI
Digital image symbolizing neural nets

Leveraging statistical concepts from measurement science and education, AI researchers have greatly reduced the computational demand of predicting how the largest of large language models will scale up in the future. It could save millions of dollars in training costs.

While Big Tech is tight-lipped on how much it costs to train large language models like ChatGPT, Claude, or Gemini, estimates range from hundreds of millions to a billion dollars for each training iteration. That steep cost means AI developers would prefer to train their new models only once.

To rein in costs and increase confidence in these massive singular training runs, developers have come to rely upon what are known as scaling laws to probe the capabilities of the many smaller models that make up their models. That is, they help predict how the language models will scale up during training. The scaling laws have now become essential AI infrastructure, and even these scaling techniques require expensive compute. 

Now, scholars have developed a new approach to scaling that reduces training demands significantly, lowering the time and cost of scaling. 

“Before scaling laws were proven, the best-known developers gambled and bet the farm on them, and it happened to work out for them. They made big strategic decisions about how to tweak and design their models and used scaling laws to extrapolate performance, and they were right. But scaling was still expensive, just less expensive than the alternative,” says Sanmi Koyejo, assistant professor of computer science and senior author of a new study accepted at the International Conference on Machine Learning that introduces a clever way to improve scaling while reducing computational demands as much as 99%. 

“The core question we study is quite simple,” says Sang Truong, a graduate student in Koyejo’s lab and first author of the paper, “Can we use algorithms to improve scaling?”

Essential Architecture

In their new paper, Koyejo, Truong, and colleagues show how they can tailor scaling algorithms to reduce computational demand significantly. They call their framework Item Response Scaling Laws (IRSL). It is the same concept used by standardized academic assessments like the SAT.

Borrowing principles from measurement science (psychometrics) and education, IRSL builds from the relationship of test takers to the questions they are asked, increasing question difficulty with successive rounds as the model answers correctly. This significantly reduces the number of queries needed to accurately estimate ability, Koyejo says. The researchers show that IRSL achieves equal or greater predictive accuracy with far fewer queries – saving time and money while improving performance. 

It’s a sort of statistical shortcut. Koyejo and Truong use information more effectively and efficiently rather than asking every question of every model multiple times. The potential questions in traditional scaling can number 10,000 or more. Multiplied by the number of models and the number of times answers must be sampled, a scaling run could include 10 trillion queries. IRSL, on the other hand, delivers equivalent accuracy using as few as 50 questions – a reduction of more than 99 percent.

Beyond Big Tech

“Under existing frameworks, you often had to run thousands of smaller models across tens of thousands of benchmark questions to predict outcomes,” Truong explains. “Our approach makes this process dramatically more efficient and more reliable. In some cases, doing less computational work improves predictive results.”

Koyejo predicts IRSL’s impact will be greatest in the academic world, where the costs of training can be prohibitive, but deep-pocketed private developers could benefit, too. The goal is to provide researchers new tools to help them reason about scaling in a scientific and statistically rigorous way, Truong says.

“We believe Item Response Scaling Laws is an important step forward,” Koyejo concludes. “It shows that you can refine scaling – and training in general. It gives you the counterintuitive combination of a better signal with less work.”

Contributing authors include graduate students Rylan Schaeffer at Stanford and Yuheng Tu of the University of California, Los Angeles.

This work was made possible by funding from the National Science Foundation, ARPA-H, the MacArthur Foundation, Schmidt Sciences, the Stanford Institute for Human-Centered Artificial Intelligence (HAI), OpenAI, Microsoft, and Google.

Share
Link copied to clipboard!
Contributor(s)
Andrew Myers

Related News

An AI Health Coach Could Change Your Mindset
Katharine Miller
Apr 23, 2026
News
A runner with a smartphone laces her shoes

Bloom, a health coaching app created by Stanford researchers, helps people tap into their own motivations.

News
A runner with a smartphone laces her shoes

An AI Health Coach Could Change Your Mindset

Katharine Miller
HealthcareGenerative AIApr 23

Bloom, a health coaching app created by Stanford researchers, helps people tap into their own motivations.

Using LLMs To Improve Workplace Social Skills
Katharine Miller
Apr 20, 2026
News
A woman takes notes while working on a tablet

Practicing specific social skills with AI chatbots helps users build confidence and competence.

News
A woman takes notes while working on a tablet

Using LLMs To Improve Workplace Social Skills

Katharine Miller
Education, SkillsGenerative AIHealthcareApr 20

Practicing specific social skills with AI chatbots helps users build confidence and competence.

AI’s ‘Delusional Spirals’ (and What to Do About Them)
Andrew Myers
Apr 20, 2026
News

In a world where chatbots can stand in for friends, counselors, and even lovers, the mental health risks are a growing concern.

News

AI’s ‘Delusional Spirals’ (and What to Do About Them)

Andrew Myers
HealthcareGenerative AIApr 20

In a world where chatbots can stand in for friends, counselors, and even lovers, the mental health risks are a growing concern.