New Approach to Scaling Laws Could Change How AI Models Are Trained

Date

May 21, 2026

Topics

Leveraging statistical concepts from measurement science and education, AI researchers have greatly reduced the computational demand of predicting how the largest of large language models will scale up in the future. It could save millions of dollars in training costs.

While Big Tech is tight-lipped on how much it costs to train large language models like ChatGPT, Claude, or Gemini, estimates range from hundreds of millions to a billion dollars for each training iteration. That steep cost means AI developers would prefer to train their new models only once.

To rein in costs and increase confidence in these massive singular training runs, developers have come to rely upon what are known as scaling laws to probe the capabilities of the many smaller models that make up their models. That is, they help predict how the language models will scale up during training. The scaling laws have now become essential AI infrastructure, and even these scaling techniques require expensive compute.

Now, scholars have developed a new approach to scaling that reduces training demands significantly, lowering the time and cost of scaling.

“Before scaling laws were proven, the best-known developers gambled and bet the farm on them, and it happened to work out for them. They made big strategic decisions about how to tweak and design their models and used scaling laws to extrapolate performance, and they were right. But scaling was still expensive, just less expensive than the alternative,” says Sanmi Koyejo, assistant professor of computer science and senior author of a new study accepted at the International Conference on Machine Learning that introduces a clever way to improve scaling while reducing computational demands as much as 99%.

“The core question we study is quite simple,” says Sang Truong, a graduate student in Koyejo’s lab and first author of the paper, “Can we use algorithms to improve scaling?”

Essential Architecture

In their new paper, Koyejo, Truong, and colleagues show how they can tailor scaling algorithms to reduce computational demand significantly. They call their framework Item Response Scaling Laws (IRSL). It is the same concept used by standardized academic assessments like the SAT.

Borrowing principles from measurement science (psychometrics) and education, IRSL builds from the relationship of test takers to the questions they are asked, increasing question difficulty with successive rounds as the model answers correctly. This significantly reduces the number of queries needed to accurately estimate ability, Koyejo says. The researchers show that IRSL achieves equal or greater predictive accuracy with far fewer queries – saving time and money while improving performance.

It’s a sort of statistical shortcut. Koyejo and Truong use information more effectively and efficiently rather than asking every question of every model multiple times. The potential questions in traditional scaling can number 10,000 or more. Multiplied by the number of models and the number of times answers must be sampled, a scaling run could include 10 trillion queries. IRSL, on the other hand, delivers equivalent accuracy using as few as 50 questions – a reduction of more than 99 percent.

Beyond Big Tech

“Under existing frameworks, you often had to run thousands of smaller models across tens of thousands of benchmark questions to predict outcomes,” Truong explains. “Our approach makes this process dramatically more efficient and more reliable. In some cases, doing less computational work improves predictive results.”

Koyejo predicts IRSL’s impact will be greatest in the academic world, where the costs of training can be prohibitive, but deep-pocketed private developers could benefit, too. The goal is to provide researchers new tools to help them reason about scaling in a scientific and statistically rigorous way, Truong says.

“We believe Item Response Scaling Laws is an important step forward,” Koyejo concludes. “It shows that you can refine scaling – and training in general. It gives you the counterintuitive combination of a better signal with less work.”

Contributing authors include graduate students Rylan Schaeffer at Stanford and Yuheng Tu of the University of California, Los Angeles.

This work was made possible by funding from the National Science Foundation, ARPA-H, the MacArthur Foundation, Schmidt Sciences, the Stanford Institute for Human-Centered Artificial Intelligence (HAI), OpenAI, Microsoft, and Google.

Related News

Stanford Study Exposes Major Flaw in AI Mental Health Safety Testing

Andrew Myers

Jul 13, 2026

News

mental health ai illustration head with binary code

With increased use of chatbots in mental health contexts, AI developers now rely on human experts to evaluate AI’s responses for “safety” – but experts rarely agree on what’s safe.

News

Stanford Study Exposes Major Flaw in AI Mental Health Safety Testing

Andrew Myers

HealthcareGenerative AIPrivacy, Safety, SecurityJul 13

With increased use of chatbots in mental health contexts, AI developers now rely on human experts to evaluate AI’s responses for “safety” – but experts rarely agree on what’s safe.

HAI Student Affinity Groups Take On Society’s Emerging Questions

Madeleine Wright

Jun 26, 2026

News

Stanford students across disciplines are teaming up to tackle society’s pressing questions in the age of AI.

News

HAI Student Affinity Groups Take On Society’s Emerging Questions

Madeleine Wright

Arts, HumanitiesGenerative AIEthics, Equity, InclusionPrivacy, Safety, SecurityJun 26

Stanford students across disciplines are teaming up to tackle society’s pressing questions in the age of AI.

Today's AI Talks Like “Nobody.” New Research Gives It Real Personality.

Jun 08, 2026

News

3D illustration of mirrored human profiles in blue and yellow layers

PsychAdapter lets researchers dial in on personality traits, age, and mental health characteristics to generate text that sounds like real individuals, opening the door to training simulations and personalized content.

News

Today's AI Talks Like “Nobody.” New Research Gives It Real Personality.

HealthcareGenerative AISciences (Social, Health, Biological, Physical)Jun 08

Navigate

Participate

Stay Up To Date

New Approach to Scaling Laws Could Change How AI Models Are Trained

Essential Architecture

Beyond Big Tech

Related News

Stanford Study Exposes Major Flaw in AI Mental Health Safety Testing

Stanford Study Exposes Major Flaw in AI Mental Health Safety Testing

HAI Student Affinity Groups Take On Society’s Emerging Questions

HAI Student Affinity Groups Take On Society’s Emerging Questions

Today's AI Talks Like “Nobody.” New Research Gives It Real Personality.

Today's AI Talks Like “Nobody.” New Research Gives It Real Personality.