Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
Regulating AI Through Data Privacy | Stanford HAI

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
news

Regulating AI Through Data Privacy

Date
January 11, 2022

How AI could intersect with California’s new privacy law.

In the absence of a national data privacy law in the U.S., California has been more active than any other state in efforts to fill the gap on a state level. The state enacted one of the nation’s first data privacy laws, the California Privacy Rights Act (Proposition 24) in 2020, and an additional law will take effect in 2023. A new state agency created by the law, the California Privacy Protection Agency, recently issued an invitation for public comment on the many open questions surrounding the law’s implementation. 

Our team of Stanford researchers, graduate students, and undergraduates examined the proposed law and have concluded that data privacy can be a useful tool in regulating AI, but California’s new law must be more narrowly tailored to prevent overreach, focus more on AI model transparency, and ensure people’s rights to delete their personal information are not usurped by the use of AI. Additionally, we suggest that the regulation’s proposed transparency provision requiring companies to explain to consumers the logic underlying their “automated decision making” processes could be more powerful if it instead focused on providing greater transparency about the data used to enable such processes. Finally, we argue that the data embedded in machine-learning models must be explicitly included when considering consumers’ rights to delete, know, and correct their data. 

The Link Between Consumer Privacy and AI

There’s precedent for regulating AI with data privacy law, at least indirectly. The authors of Proposition 24 borrowed language on “automated decision making” (ADM) technologies directly from the General Data Protection Regulation (GDPR), the E.U. law that governs how residents’ personal data can be collected and used. As defined by the GDPR, ADM technologies are those that have “the ability to make decisions by technological means without human involvement,” and the GDPR gives consumers the right to refuse to be subject to any such automated decision insofar as it produces legal consequences, such as a loan denial. While ADM is not synonymous with AI, a broad range of AI-driven processes meet the GDPR’s definition and have therefore been directly impacted by the law. The inclusion of restrictions on ADM technologies in the California Privacy Rights Act is likely to have a similarly broad impact in California — or perhaps even broader, given some of the issues with the CPRA’s current language. 

More generally, though, given AI’s reliance on vast quantities of data, regulating AI through data privacy law is not only inevitable to some extent, but also a powerful regulatory strategy that lawmakers should carefully consider as they explore approaches to mitigating AI’s risks. Most attention on regulating artificial intelligence to date has focused on algorithms, but as the GDPR demonstrates, data regulation can also be a tool for constraining the contexts in which AI can be used. For example, the California Consumer Privacy Act includes a provision regarding data collection and reuse that prohibits companies from repurposing data outside of the original scope of collection. This means that if you collect data from a California resident for one purpose, you can’t reuse it to train a machine-learning model without explicit consent if that reuse is not consistent with the original rationale for collection. When effectively enforced, data privacy law inevitably impacts AI at its root, data, and the challenge therefore is to ensure that this impact is calibrated in an effective, well-targeted way. 

Recommendation 1: Narrow the Scope of Automated Decision making 

With respect to Proposition 24, we are concerned that the inclusion of automated decision making language as currently written in the regulation would lead to an overly broad interpretation of just what constitutes an ADM process, potentially requiring the labeling of nearly any type of algorithm deployed on websites with California users. The CPRA adopted the GDPR’s language on ADM technologies, but left out something important — while the GDPR narrows the scope of regulated ADM technologies to those that have legal consequences for an individual, the CPRA includes no such qualifier. The existing text could apply equally to an algorithm on a car rental website that verifies your age as to an ADM system that uses your personal data to calculate your insurability risk. This breadth would far exceed the intent of the proposition, which is to focus on privacy issues generally and more specifically on curbing the practices that enable the widespread data profiling of individuals at scale. We recommend the agency adopt a narrower definition of ADM that focuses on specific privacy outcomes of concern. 

Recommendation 2: Focus on Transparency vs. Explainability

Research has shown that even in the somewhat infrequent case where it’s feasible to meaningfully explain the rationale of an AI-driven decision making process, passing this information directly to consumers may be of limited value. A person who hasn’t had any training in machine learning may draw little meaning from a collection of model weights and parameters. 

Instead of focusing on explaining the often-opaque rationales of ADM technologies, companies should instead be required under the CPRA to supply consumers detailed information on the source and contents of the data that drives those rationales, including whether the consumer was asked to consent to the collection of that data by its earliest collector. 

Recommendation 3: Respect the Right to Delete in Machine-Learning Models 

The CPRA empowers consumers with new rights to delete, correct, and know what personal data a business has collected about them. But for an AI-driven company, a data deletion request is not as simple as clearing a cell in a data table. Personal data collected from consumers is used to train machine-learning algorithms that are then often deployed at scale — and the impact of any given data point, accurate or not, will continue to propagate until that model is retrained without that data. 

But despite the sometimes significant costs required to retrain AI models, consumer rights to delete and correct must extend to data embedded within such models in order to be meaningful. A failure to include such a provision would directly undermine consumer privacy rights as envisioned by the CPRA — research has shown that under some conditions, original training data can be reconstructed and ultimately deanonymized by analyzing the behavior of a model that incorporates it.

And new research shows this process might be possible without serious costs and time investments for companies. Stanford professor and HAI Faculty Affiliate James Zou, who co-authored these comments, helped to develop a technique called “approximate deletion” that is designed to negate the impact of specified data on an AI model without divulging the contents of that data nor requiring a full retraining of the model. 

Explicitly covering data embedded in AI models under the new deletion rights also dovetails with our core argument for greater transparency around data provenance. If provided detailed information on the origin of the personal data that underlies AI models, consumers would be empowered to delete or correct specific data points that drive inaccuracies or bias in these models. Meanwhile, businesses would be incentivized to more carefully vet data at the outset, in an effort to avoid costly alterations and retrainings after a model is already deployed.

This added weight of accountability could go a long way in reining in harmful uses of AI, while avoiding the need to define narrow, ephemeral brackets of high-risk algorithms.

Jennifer King is the Privacy and Data Policy Fellow at the Stanford University Institute for Human-Centered Artificial Intelligence. Eli MacKinnon is a graduate student focusing on international policy. 

Stanford HAI's mission is to advance AI research, education, policy and practice to improve the human condition. Learn more. 

Share
Link copied to clipboard!
Contributor(s)
Eli MacKinnon and Dr. Jennifer King

Related News

What Workers Really Want from Artificial Intelligence
Shana Lynch
Jul 07, 2025
News
illustration of data pouring out of a business suit

A Stanford study captures the gap between worker desires and AI’s abilities, and highlights areas ripe for research and development.

News
illustration of data pouring out of a business suit

What Workers Really Want from Artificial Intelligence

Shana Lynch
Education, SkillsWorkforce, LaborJul 07

A Stanford study captures the gap between worker desires and AI’s abilities, and highlights areas ripe for research and development.

Stanford AI Scholars Find Support for Innovation in a Time of Uncertainty
Nikki Goth Itoi
Jul 01, 2025
News

Stanford HAI offers critical resources for faculty and students to continue groundbreaking research across the vast AI landscape.

News

Stanford AI Scholars Find Support for Innovation in a Time of Uncertainty

Nikki Goth Itoi
Machine LearningFoundation ModelsEducation, SkillsJul 01

Stanford HAI offers critical resources for faculty and students to continue groundbreaking research across the vast AI landscape.

New Large Language Model Helps Patients Understand Their Radiology Reports
Vignesh Ramachandran
Jun 23, 2025
News

‘RadGPT’ cuts through medical jargon to answer common patient questions.

News

New Large Language Model Helps Patients Understand Their Radiology Reports

Vignesh Ramachandran
HealthcareNatural Language ProcessingJun 23

‘RadGPT’ cuts through medical jargon to answer common patient questions.