In the absence of a national data privacy law in the U.S., California has been more active than any other state in efforts to fill the gap on a state level. The state enacted one of the nation’s first data privacy laws, the California Privacy Rights Act (Proposition 24) in 2020, and an additional law will take effect in 2023. A new state agency created by the law, the California Privacy Protection Agency, recently issued an invitation for public comment on the many open questions surrounding the law’s implementation.
Our team of Stanford researchers, graduate students, and undergraduates examined the proposed law and have concluded that data privacy can be a useful tool in regulating AI, but California’s new law must be more narrowly tailored to prevent overreach, focus more on AI model transparency, and ensure people’s rights to delete their personal information are not usurped by the use of AI. Additionally, we suggest that the regulation’s proposed transparency provision requiring companies to explain to consumers the logic underlying their “automated decision making” processes could be more powerful if it instead focused on providing greater transparency about the data used to enable such processes. Finally, we argue that the data embedded in machine-learning models must be explicitly included when considering consumers’ rights to delete, know, and correct their data. A copy of our comments is available here, and the Agency recently posted all of the submitted comments to its website.
The Link Between Consumer Privacy and AI
There’s precedent for regulating AI with data privacy law, at least indirectly. The authors of Proposition 24 borrowed language on “automated decision making” (ADM) technologies directly from the General Data Protection Regulation (GDPR), the E.U. law that governs how residents’ personal data can be collected and used. As defined by the GDPR, ADM technologies are those that have “the ability to make decisions by technological means without human involvement,” and the GDPR gives consumers the right to refuse to be subject to any such automated decision insofar as it produces legal consequences, such as a loan denial. While ADM is not synonymous with AI, a broad range of AI-driven processes meet the GDPR’s definition and have therefore been directly impacted by the law. The inclusion of restrictions on ADM technologies in the California Privacy Rights Act is likely to have a similarly broad impact in California — or perhaps even broader, given some of the issues with the CPRA’s current language.
More generally, though, given AI’s reliance on vast quantities of data, regulating AI through data privacy law is not only inevitable to some extent, but also a powerful regulatory strategy that lawmakers should carefully consider as they explore approaches to mitigating AI’s risks. Most attention on regulating artificial intelligence to date has focused on algorithms, but as the GDPR demonstrates, data regulation can also be a tool for constraining the contexts in which AI can be used. For example, the California Consumer Privacy Act includes a provision regarding data collection and reuse that prohibits companies from repurposing data outside of the original scope of collection. This means that if you collect data from a California resident for one purpose, you can’t reuse it to train a machine-learning model without explicit consent if that reuse is not consistent with the original rationale for collection. When effectively enforced, data privacy law inevitably impacts AI at its root, data, and the challenge therefore is to ensure that this impact is calibrated in an effective, well-targeted way.
Recommendation 1: Narrow the Scope of Automated Decision making
With respect to Proposition 24, we are concerned that the inclusion of automated decision making language as currently written in the regulation would lead to an overly broad interpretation of just what constitutes an ADM process, potentially requiring the labeling of nearly any type of algorithm deployed on websites with California users. The CPRA adopted the GDPR’s language on ADM technologies, but left out something important — while the GDPR narrows the scope of regulated ADM technologies to those that have legal consequences for an individual, the CPRA includes no such qualifier. The existing text could apply equally to an algorithm on a car rental website that verifies your age as to an ADM system that uses your personal data to calculate your insurability risk. This breadth would far exceed the intent of the proposition, which is to focus on privacy issues generally and more specifically on curbing the practices that enable the widespread data profiling of individuals at scale. We recommend the agency adopt a narrower definition of ADM that focuses on specific privacy outcomes of concern.
Recommendation 2: Focus on Transparency vs. Explainability
Research has shown that even in the somewhat infrequent case where it’s feasible to meaningfully explain the rationale of an AI-driven decision making process, passing this information directly to consumers may be of limited value. A person who hasn’t had any training in machine learning may draw little meaning from a collection of model weights and parameters.
Instead of focusing on explaining the often-opaque rationales of ADM technologies, companies should instead be required under the CPRA to supply consumers detailed information on the source and contents of the data that drives those rationales, including whether the consumer was asked to consent to the collection of that data by its earliest collector.
Recommendation 3: Respect the Right to Delete in Machine-Learning Models
The CPRA empowers consumers with new rights to delete, correct, and know what personal data a business has collected about them. But for an AI-driven company, a data deletion request is not as simple as clearing a cell in a data table. Personal data collected from consumers is used to train machine-learning algorithms that are then often deployed at scale — and the impact of any given data point, accurate or not, will continue to propagate until that model is retrained without that data.
But despite the sometimes significant costs required to retrain AI models, consumer rights to delete and correct must extend to data embedded within such models in order to be meaningful. A failure to include such a provision would directly undermine consumer privacy rights as envisioned by the CPRA — research has shown that under some conditions, original training data can be reconstructed and ultimately deanonymized by analyzing the behavior of a model that incorporates it.
And new research shows this process might be possible without serious costs and time investments for companies. Stanford professor and HAI Faculty Affiliate James Zou, who co-authored these comments, helped to develop a technique called “approximate deletion” that is designed to negate the impact of specified data on an AI model without divulging the contents of that data nor requiring a full retraining of the model.
Explicitly covering data embedded in AI models under the new deletion rights also dovetails with our core argument for greater transparency around data provenance. If provided detailed information on the origin of the personal data that underlies AI models, consumers would be empowered to delete or correct specific data points that drive inaccuracies or bias in these models. Meanwhile, businesses would be incentivized to more carefully vet data at the outset, in an effort to avoid costly alterations and retrainings after a model is already deployed.
This added weight of accountability could go a long way in reining in harmful uses of AI, while avoiding the need to define narrow, ephemeral brackets of high-risk algorithms.
Jennifer King is the Privacy and Data Policy Fellow at the Stanford University Institute for Human-Centered Artificial Intelligence. Eli MacKinnon is a graduate student focusing on international policy.
Stanford HAI's mission is to advance AI research, education, policy and practice to improve the human condition. Learn more.