Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
Offline “Studying” Shrinks the Cost of Contextually Aware AI | Stanford HAI

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
news

Offline “Studying” Shrinks the Cost of Contextually Aware AI

Date
September 29, 2025
Topics
Foundation Models
Machine Learning
Blue abstract background with light traveling through abstract flat cable illustrating data flow (3D render)
istock

By having AI study a user’s context offline, researchers dramatically reduce the memory and cost required to make AI contextually aware.

Imagine you are a lawyer using an AI chatbot to help you prepare a brief. You upload some relevant legal documents to your favorite bot and start issuing prompts. Soon, the bot is chugging away. What you don’t see, however, is the tremendous memory, compute power, and energy being consumed in the background.

“The AI’s internal representation of a 70,000-word legal document might consume more than 100 gigabytes of precious GPU memory. To put into context how large this representation is, the raw text from that same document is only 400 kilobytes — 250,000 times smaller,” explains Sabri Eyuboglu, a doctoral student in computer science at Stanford University. “All of this memory consumption makes it really costly and slow to produce the chatbot’s response.”

Eyuboglu is the first author of a new preprint paper with an intriguing solution to this problem. He calls them "Cartridges." A Cartridge is a compact memory module that is trained offline to represent a document or other text, allowing an AI bot to answer queries quickly without reanalyzing the full document. Eyuboglu says that Cartridges work for any large body of textual information — legal documents, computer code, personal files, textbooks, patient medical records, and more.

“Today’s AI systems do a good job adapting their responses to a small amount of context — think a few pages of text. But, unfortunately, the performance and efficiency of today’s systems degrade as context grows,” says author Simran Arora. “With Cartridges, we’re exploring ways to more efficiently and effectively scale up the amount of context we can provide to the model.”

By storing context in these compact Cartridges, Arora and Eyuboglu along with co-authors and their advisor Professor Chris Ré found they could shrink memory requirements by orders of magnitude. Cartridges, they say, use almost 40 times less memory, boosting the bot’s words-per-second output by more than 25 times compared with conventional in-context learning (ICL) methods. The research was partially funded by the Stanford Institute for Human-Centered AI.

New Horizons

The innovation sprung from a relatively modest concept. “Since the same documents are often referenced by many queries, let’s invest a ton of compute up front to prepare the Cartridges,” Eyuboglu says. “Then, as we get more queries down the line, we can respond very quickly.”

This is not the first time researchers have tried to lighten AI’s memory load, but prior efforts invested comparatively little compute in the compression process. Memory footprints were smaller, yes, but those gains came at a high cost — the answers were worse. In contrast, Cartridges consume less memory while still producing high-quality answers. This is possible because they are produced in a very compute-intensive process. “This trade-off is desirable when contexts are shared across many queries and the cost of producing the Cartridges can be shared,” Eyuboglu says.

In effect, Cartridges train themselves through a key innovation the team calls “self-study.” With self-study, the model doesn’t simply memorize the text, it carries on a conversation with itself about the document, essentially simulating the queries a real user might ask. These conversations are then baked into the Cartridge using standard training algorithms. In this way, Cartridges can be used across diverse prompts, which saves time, effort, and memory down the road, Eyuboglu says.

“If you just train only on the context with a simple objective like next-token prediction, you could memorize the document, but you’d only be able to regurgitate it,” Eyuboglu says. “What the synthetic conversations do — what self-study does — is critical for allowing the model to actually answer general questions and tasks quickly and accurately at a later point in time.”

Next Steps

Cartridges are by no means free. Self-study requires the use of a powerful multi-GPU system. They actually require more energy to produce/train initially, but then regular use  will require less energy. A key part is that training Cartridges can happen offline when compute power is cheap or in lower demand. And Cartridges can be reused across countless queries of any large body of text.

Future directions Eyuboglu hints at include more efficient training of the Cartridges, real-world deployment in specific domains like medicine and law, and perhaps even standard libraries of Cartridges for public use. Eyuboglu notes that those trained on different texts can be combined, an intriguing finding that could propel future research on Cartridges.

What’s most exciting to the team is that Cartridges may present a scalable and sustainable path toward AI systems that are personalized and continually learn from the user’s context.

“The recent history of AI has been all about building huge monolithic models that are the same for everyone,” Eyuboglu concludes. “I think we’re starting to see the limits of that approach. With this work, we’re providing evidence that self-study techniques could present a scalable path forward.”

istock
Share
Link copied to clipboard!
Contributor(s)
Andrew Myers

Related News

Fei-Fei Li Wins Queen Elizabeth Prize for Engineering
Shana Lynch
Nov 07, 2025
News

The Stanford HAI co-founder is recognized for breakthroughs that propelled computer vision and deep learning, and for championing human-centered AI and industry innovation.

News

Fei-Fei Li Wins Queen Elizabeth Prize for Engineering

Shana Lynch
Computer VisionMachine LearningNov 07

The Stanford HAI co-founder is recognized for breakthroughs that propelled computer vision and deep learning, and for championing human-centered AI and industry innovation.

BEHAVIOR Challenge Charts the Way Forward for Domestic Robotics
Andrew Myers
Sep 22, 2025
News

With a first-of-its-kind competition for roboticists everywhere, researchers at Stanford are hoping to push domestic robotics into a new age of autonomy and capability.

News

BEHAVIOR Challenge Charts the Way Forward for Domestic Robotics

Andrew Myers
RoboticsMachine LearningSep 22

With a first-of-its-kind competition for roboticists everywhere, researchers at Stanford are hoping to push domestic robotics into a new age of autonomy and capability.

Stanford AI Scholars Find Support for Innovation in a Time of Uncertainty
Nikki Goth Itoi
Jul 01, 2025
News

Stanford HAI offers critical resources for faculty and students to continue groundbreaking research across the vast AI landscape.

News

Stanford AI Scholars Find Support for Innovation in a Time of Uncertainty

Nikki Goth Itoi
Machine LearningFoundation ModelsEducation, SkillsJul 01

Stanford HAI offers critical resources for faculty and students to continue groundbreaking research across the vast AI landscape.