Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
A New Approach to the Data-Deletion Conundrum | Stanford HAI

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
    • Subscribe to Email
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
news

A New Approach to the Data-Deletion Conundrum

Date
September 24, 2021
Topics
Machine Learning

A team of computer scientists devised a way to quickly remove traces of sensitive user information from machine learning models.

Rising consumer concern over data privacy has led to a rush of “right to be forgotten” laws around the world that allow individuals to request their personal data be expunged from massive databases that catalog our increasingly online lives. Researchers in artificial intelligence have observed that user data does not only exist in its raw form in a database, it is also implicitly contained in models trained on that data. So far, they have struggled to find methods for deleting these “traces” of users efficiently. The more complex the model is, the more challenging it becomes to delete data.

“The exact deletion of data — the ideal — is hard to do in real time,” says James Zou, a professor of biomedical data science at Stanford University and an expert in artificial intelligence. “In training our machine learning models, bits and pieces of data can get embedded in the model in complicated ways. That makes it hard for us to guarantee a user has truly been forgotten without altering our models substantially.”

Zou is senior author of a paper recently presented at the International Conference on Artificial Intelligence and Statistics (AISTATS) that may provide a possible answer to the data deletion problem that works for privacy-concerned individuals and artificial intelligence experts alike. They call it approximate deletion.

Read the study: Approximate Data Deletion from Machine Learning Models

 

“Approximate deletion, as the name suggests, allows us to remove most of the users’ implicit data from the model. They are ‘forgotten,’ but in such a way that we can do the retraining of our models at a later, more opportune time,” says Zach Izzo, a graduate student in mathematics and the first author of the AISTATS paper.

Approximate deletion is especially useful in quickly removing sensitive information or features unique to a given individual that could potentially be used for identification after the fact, while postponing the computationally intensive full model retraining to times of lower computational demand. Under certain assumptions, Zou says, approximate deletion even achieves the holy grail of exact deletion of a user’s implicit data from the trained model.

Driven by Data

Machine learning works by combing databases and applying various predictive weights to features in the data — an online shopper’s age, location, and previous purchase history, for instance, or a streamer’s past viewing history and personal ratings of movies  watched. The models are not confined to commercial applications and are now widely used in radiology, pathology, and other fields of direct human impact.

In theory, information in a database is anonymized, but users concerned about privacy fear that they can still be identified by the bits and pieces of information about them that are still wedged in the models, begetting the need for right to be forgotten laws.

The gold standard in the field, Izzo says, is to find the exact same model as if the machine learning had never seen the deleted data points in the first place. That standard, known as “exact deletion,” is hard if not impossible to achieve, especially with large, complicated models like those that recommend products or movies to online shoppers and streamers. Exact data deletion effectively means retraining a model from scratch, Izzo says.

“Doing that requires taking the algorithm offline for retraining. And that costs real money and real time,” he says.

What is Approximate Deletion?

In solving the deletion conundrum, Zou and Izzo have come at things slightly differently than their counterparts in the field. In effect, they create synthetic data to replace — or, more accurately, negate — that of the individual who wishes to be forgotten.

This temporary solution satisfies the privacy-minded individual’s immediate desire to not be identified from data in the model — that is, to be forgotten — while reassuring the computer scientists, and the businesses that rely upon them, that their models will work as planned, at least until a more opportune time when the model can be retrained at lower cost.

There is a philosophical aspect to the challenge, the authors say. Where privacy, law, and commerce intersect, the discussion begins with a meaningful definition of what it means to “delete” information. Does deletion mean the actual destruction of data? Or is it enough to ensure that no one could ever identify an anonymous person from it? In the end, Izzo says, answering that key question requires balancing the privacy rights of consumers and the needs of science and commerce.

“That's a pretty difficult, non-trivial question,” Izzo says. “For many of the more complicated models used in practice, even if you delete zero people from a database, retraining alone can result in a completely different model. So even defining the proper target for the retrained model is challenging.”

With their approximate deletion approach in hand, the authors then validated the effectiveness of their method empirically, confirming their theoretical approach on the path to practical application. That critical step now becomes the goal of future work.

“We think approximate deletion is an important initial step toward solving what has been a difficult challenge for AI,” Zou says.

Stanford HAI's mission is to advance AI research, education, policy and practice to improve the human condition. Learn more. 

Share
Link copied to clipboard!
Contributor(s)
Andrew Myers

Related News

AI Leaders Discuss How To Foster Responsible Innovation At TIME100 Roundtable In Davos
TIME
Jan 21, 2026
Media Mention

HAI Senior Fellow Yejin Choi discussed responsible AI model training at Davos, asking, “What if there could be an alternative form of intelligence that really learns … morals, human values from the get-go, as opposed to just training LLMs on the entirety of the internet, which actually includes the worst part of humanity, and then we then try to patch things up by doing ‘alignment’?” 

Media Mention
Your browser does not support the video tag.

AI Leaders Discuss How To Foster Responsible Innovation At TIME100 Roundtable In Davos

TIME
Ethics, Equity, InclusionGenerative AIMachine LearningNatural Language ProcessingJan 21

HAI Senior Fellow Yejin Choi discussed responsible AI model training at Davos, asking, “What if there could be an alternative form of intelligence that really learns … morals, human values from the get-go, as opposed to just training LLMs on the entirety of the internet, which actually includes the worst part of humanity, and then we then try to patch things up by doing ‘alignment’?” 

Stanford’s Yejin Choi & Axios’ Ina Fried
Axios
Jan 19, 2026
Media Mention

Axios chief technology correspondent Ina Fried speaks to HAI Senior Fellow Yejin Choi at Axios House in Davos during the World Economic Forum.

Media Mention
Your browser does not support the video tag.

Stanford’s Yejin Choi & Axios’ Ina Fried

Axios
Energy, EnvironmentMachine LearningGenerative AIEthics, Equity, InclusionJan 19

Axios chief technology correspondent Ina Fried speaks to HAI Senior Fellow Yejin Choi at Axios House in Davos during the World Economic Forum.

Spatial Intelligence Is AI’s Next Frontier
TIME
Dec 11, 2025
Media Mention

"This is AI’s next frontier, and why 2025 was such a pivotal year," writes HAI Co-Director Fei-Fei Li.

Media Mention
Your browser does not support the video tag.

Spatial Intelligence Is AI’s Next Frontier

TIME
Computer VisionMachine LearningGenerative AIDec 11

"This is AI’s next frontier, and why 2025 was such a pivotal year," writes HAI Co-Director Fei-Fei Li.