Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
Offline “Studying” Shrinks the Cost of Contextually Aware AI | Stanford HAI

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
    • Subscribe to Email
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
news

Offline “Studying” Shrinks the Cost of Contextually Aware AI

Date
September 29, 2025
Topics
Foundation Models
Machine Learning
Blue abstract background with light traveling through abstract flat cable illustrating data flow (3D render)
istock

By having AI study a user’s context offline, researchers dramatically reduce the memory and cost required to make AI contextually aware.

Imagine you are a lawyer using an AI chatbot to help you prepare a brief. You upload some relevant legal documents to your favorite bot and start issuing prompts. Soon, the bot is chugging away. What you don’t see, however, is the tremendous memory, compute power, and energy being consumed in the background.

“The AI’s internal representation of a 70,000-word legal document might consume more than 100 gigabytes of precious GPU memory. To put into context how large this representation is, the raw text from that same document is only 400 kilobytes — 250,000 times smaller,” explains Sabri Eyuboglu, a doctoral student in computer science at Stanford University. “All of this memory consumption makes it really costly and slow to produce the chatbot’s response.”

Eyuboglu is the first author of a new preprint paper with an intriguing solution to this problem. He calls them "Cartridges." A Cartridge is a compact memory module that is trained offline to represent a document or other text, allowing an AI bot to answer queries quickly without reanalyzing the full document. Eyuboglu says that Cartridges work for any large body of textual information — legal documents, computer code, personal files, textbooks, patient medical records, and more.

“Today’s AI systems do a good job adapting their responses to a small amount of context — think a few pages of text. But, unfortunately, the performance and efficiency of today’s systems degrade as context grows,” says author Simran Arora. “With Cartridges, we’re exploring ways to more efficiently and effectively scale up the amount of context we can provide to the model.”

By storing context in these compact Cartridges, Arora and Eyuboglu along with co-authors and their advisor Professor Chris Ré found they could shrink memory requirements by orders of magnitude. Cartridges, they say, use almost 40 times less memory, boosting the bot’s words-per-second output by more than 25 times compared with conventional in-context learning (ICL) methods. The research was partially funded by the Stanford Institute for Human-Centered AI.

New Horizons

The innovation sprung from a relatively modest concept. “Since the same documents are often referenced by many queries, let’s invest a ton of compute up front to prepare the Cartridges,” Eyuboglu says. “Then, as we get more queries down the line, we can respond very quickly.”

This is not the first time researchers have tried to lighten AI’s memory load, but prior efforts invested comparatively little compute in the compression process. Memory footprints were smaller, yes, but those gains came at a high cost — the answers were worse. In contrast, Cartridges consume less memory while still producing high-quality answers. This is possible because they are produced in a very compute-intensive process. “This trade-off is desirable when contexts are shared across many queries and the cost of producing the Cartridges can be shared,” Eyuboglu says.

In effect, Cartridges train themselves through a key innovation the team calls “self-study.” With self-study, the model doesn’t simply memorize the text, it carries on a conversation with itself about the document, essentially simulating the queries a real user might ask. These conversations are then baked into the Cartridge using standard training algorithms. In this way, Cartridges can be used across diverse prompts, which saves time, effort, and memory down the road, Eyuboglu says.

“If you just train only on the context with a simple objective like next-token prediction, you could memorize the document, but you’d only be able to regurgitate it,” Eyuboglu says. “What the synthetic conversations do — what self-study does — is critical for allowing the model to actually answer general questions and tasks quickly and accurately at a later point in time.”

Next Steps

Cartridges are by no means free. Self-study requires the use of a powerful multi-GPU system. They actually require more energy to produce/train initially, but then regular use  will require less energy. A key part is that training Cartridges can happen offline when compute power is cheap or in lower demand. And Cartridges can be reused across countless queries of any large body of text.

Future directions Eyuboglu hints at include more efficient training of the Cartridges, real-world deployment in specific domains like medicine and law, and perhaps even standard libraries of Cartridges for public use. Eyuboglu notes that those trained on different texts can be combined, an intriguing finding that could propel future research on Cartridges.

What’s most exciting to the team is that Cartridges may present a scalable and sustainable path toward AI systems that are personalized and continually learn from the user’s context.

“The recent history of AI has been all about building huge monolithic models that are the same for everyone,” Eyuboglu concludes. “I think we’re starting to see the limits of that approach. With this work, we’re providing evidence that self-study techniques could present a scalable path forward.”

istock
Share
Link copied to clipboard!
Contributor(s)
Andrew Myers

Related News

America's 250 Greatest Innovators: Celebrating The American Dream
Forbes
Feb 11, 2026
Media Mention

HAI Co-Director Fei-Fei Li named one of America's top 250 greatest innovators, alongside fellow Stanford affiliates Rodney Brooks, Carolyn Bertozzi, Daphne Koller, and Andrew Ng.

Media Mention
Your browser does not support the video tag.

America's 250 Greatest Innovators: Celebrating The American Dream

Forbes
Computer VisionGenerative AIFoundation ModelsEnergy, EnvironmentEthics, Equity, InclusionFeb 11

HAI Co-Director Fei-Fei Li named one of America's top 250 greatest innovators, alongside fellow Stanford affiliates Rodney Brooks, Carolyn Bertozzi, Daphne Koller, and Andrew Ng.

Smart Enough to Do Math, Dumb Enough to Fail: The Hunt for a Better AI Test
Andrew Myers
Feb 02, 2026
News
illustration of data and lines

A Stanford HAI workshop brought together experts to develop new evaluation methods that assess AI's hidden capabilities, not just its test-taking performance.

News
illustration of data and lines

Smart Enough to Do Math, Dumb Enough to Fail: The Hunt for a Better AI Test

Andrew Myers
Foundation ModelsGenerative AIPrivacy, Safety, SecurityFeb 02

A Stanford HAI workshop brought together experts to develop new evaluation methods that assess AI's hidden capabilities, not just its test-taking performance.

AI Leaders Discuss How To Foster Responsible Innovation At TIME100 Roundtable In Davos
TIME
Jan 21, 2026
Media Mention

HAI Senior Fellow Yejin Choi discussed responsible AI model training at Davos, asking, “What if there could be an alternative form of intelligence that really learns … morals, human values from the get-go, as opposed to just training LLMs on the entirety of the internet, which actually includes the worst part of humanity, and then we then try to patch things up by doing ‘alignment’?” 

Media Mention
Your browser does not support the video tag.

AI Leaders Discuss How To Foster Responsible Innovation At TIME100 Roundtable In Davos

TIME
Ethics, Equity, InclusionGenerative AIMachine LearningNatural Language ProcessingJan 21

HAI Senior Fellow Yejin Choi discussed responsible AI model training at Davos, asking, “What if there could be an alternative form of intelligence that really learns … morals, human values from the get-go, as opposed to just training LLMs on the entirety of the internet, which actually includes the worst part of humanity, and then we then try to patch things up by doing ‘alignment’?”