Rethinking Privacy in the AI Era: Policy Provocations for a Data-Centric World

Date

February 22, 2024

Topics

abstract

This white paper explores the current and future impact of privacy and data protection legislation on AI development and provides recommendations for mitigating privacy harms in an AI era.

Executive Summary

In this paper, we present a series of arguments and predictions about how existing and future privacy and data protection regulation will impact the development and deployment of AI systems.
Data is the foundation of all AI systems. Going forward, AI development will continue to increase developers’ hunger for training data, fueling an even greater race for data acquisition than we have already seen in past decades.
Largely unrestrained data collection poses unique risks to privacy that extend beyond the individual level—they aggregate to pose societal-level harms that cannot be addressed through the exercise of individual data rights alone.
While existing and proposed privacy legislation, grounded in the globally accepted Fair Information Practices (FIPs), implicitly regulate AI development, they are not sufficient to address the data acquisition race as well as the resulting individual and systemic privacy harms.
Even legislation that contains explicit provisions on algorithmic decision-making and other forms of AI does not provide the data governance measures needed to meaningfully regulate the data used in AI systems.
We present three suggestions for how to mitigate the risks to data privacy posed by the development and adoption of AI:

Denormalize data collection by default by shifting away from opt-out to opt-in data collection. Data collectors must facilitate true data minimization through “privacy by default” strategies and adopt technical standards and infrastructure for meaningful consent mechanisms.
Focus on the AI data supply chain to improve privacy and data protection. Ensuring dataset transparency and accountability across the entire life cycle must be a focus of any regulatory system that addresses data privacy.
Flip the script on the creation and management of personal data. Policymakers should support the development of new governance mechanisms and technical infrastructure (e.g., data intermediaries and data permissioning infrastructure) to support and automate the exercise of individual data rights and preferences.

Introduction

Foundation models (e.g., GPT-4, Llama 2) are at the epicenter of AI, driving technological innovation and billions in investment. This paradigm shift has sparked widespread demands for regulation. Animated by factors as diverse as declining transparency and unsafe labor practices, limited protections for copyright and creative work, as well as market concentration and productivity gains, many have called for policymakers to take action.

Central to the debate about how to regulate foundation models is the process by which foundation models are released. Some foundation models like Google DeepMind’s Flamingo are fully closed, meaning they are available only to the model developer; others, such as OpenAI’s GPT-4, are limited access, available to the public but only as a black box; and still others, such as Meta’s Llama 2, are more open, with widely available model weights enabling downstream modification and scrutiny. As of August 2023, the U.K.’s Competition and Markets Authority documents the most common release approach for publicly-disclosed models is open release based on data from Stanford’s Ecosystem Graphs. Developers like Meta, Stability AI, Hugging Face, Mistral, Together AI, and EleutherAI frequently release models openly.

Governments around the world are issuing policy related to foundation models. As part of these efforts, open foundation models have garnered significant attention: The recent U.S. Executive Order on Safe, Secure, and Trustworthy Artificial Intelligence tasks the National Telecommunications and Information Administration with preparing a report on open foundation models for the president. In the EU, open foundation models trained with fewer than 1025 floating point operations (a measure of the amount of compute expended) appear to be exempted under the recently negotiated AI Act. The U.K.’s AI Safety Institute will “consider open-source systems as well as those deployed with various forms of access controls” as part of its initial priorities. Beyond governments, the Partnership on AI has introduced guidelines for the safe deployment of foundation models, recommending against open release for the most capable foundation models.

Policy on foundation models should support the open foundation model ecosystem, while providing resources to monitor risks and create safeguards to address harms. Open foundation models provide significant benefits to society by promoting competition, accelerating innovation, and distributing power. For example, small businesses hoping to build generative AI applications could choose among a variety of open foundation models that offer different capabilities and are often less expensive than closed alternatives. Further, open models are marked by greater transparency and, thereby, accountability. When a model is released with its training data, independent third parties can better assess the model’s capabilities and risks.

However, an emerging concern is whether open foundation models pose distinct risks to society. Unlike closed foundation model developers, open developers have limited ability to restrict the use of their models by malicious actors that can easily remove safety guardrails. Recent studies claim that open foundation models are more likely to generate disinformation, cyberweapons, bioweapons, and spear-phishing emails.

Correctly characterizing these distinct risks requires centering the marginal risk: To what extent do open foundation models increase risk relative to (a) closed foundation models or (b) pre-existing technologies like search engines? We find that for many dimensions, the existing evidence about the marginal risk of open foundation models remains quite limited. In some instances, such as the case of AI-generated child sexual abuse material (CSAM) and nonconsensual intimate imagery (NCII), harms stemming from open foundation models have been better documented. For these demonstrated harms, proposals to restrict the release of foundation models via licensing of compute-intensive models are mismatched, because the text-to-image models used to cause these harms require vastly lower amounts of resources to train.

More broadly, several regulatory approaches under consideration are likely to have a disproportionate impact on open foundation models and their developers, without meaningfully reducing risk. Even though these approaches do not differentiate between open and closed foundation model developers, they yield asymmetric compliance burdens. For example, legislation that holds developers liable for content generated using their models or their derivatives would harm open developers as users can modify their models to generate illicit content. Policymakers should exercise caution to avoid unintended consequences and ensure adequate consultation with open foundation model developers before taking action.

Related Publications

Safeguarding Third-Party AI Research

Kevin Klyman, Shayne Longpre, Sayash Kapoor, Rishi Bommasani, Percy Liang, Peter Henderson

Feb 13, 2025

Policy Brief

This brief examines the barriers to independent AI evaluation and proposes safe harbors to protect good-faith third-party research.

Policy Brief

Safeguarding Third-Party AI Research

Kevin Klyman, Shayne Longpre, Sayash Kapoor, Rishi Bommasani, Percy Liang, Peter Henderson

Privacy, Safety, SecurityRegulation, Policy, GovernanceFeb 13

This brief examines the barriers to independent AI evaluation and proposes safe harbors to protect good-faith third-party research.

Assessing the Implementation of Federal AI Leadership and Compliance Mandates

Jennifer Wang, Mirac Suzgun, Caroline Meinhardt, Daniel Zhang, Kazia Nowacki, Daniel E. Ho

Jan 17, 2025

Whitepaper

This white paper assesses federal efforts to advance leadership on AI innovation and governance through recent executive actions and emphasizes the need for senior-level leadership to achieve a whole-of-government approach.

Whitepaper

Assessing the Implementation of Federal AI Leadership and Compliance Mandates

Jennifer Wang, Mirac Suzgun, Caroline Meinhardt, Daniel Zhang, Kazia Nowacki, Daniel E. Ho

Government, Public AdministrationRegulation, Policy, GovernanceJan 17

What Makes a Good AI Benchmark?

Anka Reuel, Amelia Hardy, Chandler Smith, Max Lamparth, Malcolm Hardy, Mykel Kochenderfer

Dec 11, 2024

Policy Brief

This brief presents a novel assessment framework for evaluating the quality of AI benchmarks and scores 24 benchmarks against the framework.

Policy Brief

What Makes a Good AI Benchmark?

Anka Reuel, Amelia Hardy, Chandler Smith, Max Lamparth, Malcolm Hardy, Mykel Kochenderfer

Foundation ModelsPrivacy, Safety, SecurityDec 11

This brief presents a novel assessment framework for evaluating the quality of AI benchmarks and scores 24 benchmarks against the framework.

Response to U.S. AI Safety Institute’s Request for Comment on Managing Misuse Risk For Dual-Use Foundation Models

Rishi Bommasani, Alexander Wan, Yifan Mai, Percy Liang, Daniel E. Ho

Sep 09, 2024

Response to Request

Stanford HAI aggrees with and supports the U.S. AI Safety Institute’s (US AISI) draft guidelines for improving the safety, security, and trustworthiness of dual-use foundation models.

Response to Request

Response to U.S. AI Safety Institute’s Request for Comment on Managing Misuse Risk For Dual-Use Foundation Models

Rishi Bommasani, Alexander Wan, Yifan Mai, Percy Liang, Daniel E. Ho

Regulation, Policy, GovernanceSep 09

Stanford HAI aggrees with and supports the U.S. AI Safety Institute’s (US AISI) draft guidelines for improving the safety, security, and trustworthiness of dual-use foundation models.

policyWhitepaper

Rethinking Privacy in the AI Era: Policy Provocations for a Data-Centric World

Date

February 22, 2024

Topics

Privacy, Safety, Security

Regulation, Policy, Governance

Read Paper

abstract

This white paper explores the current and future impact of privacy and data protection legislation on AI development and provides recommendations for mitigating privacy harms in an AI era.

Executive Summary

In this paper, we present a series of arguments and predictions about how existing and future privacy and data protection regulation will impact the development and deployment of AI systems.
Data is the foundation of all AI systems. Going forward, AI development will continue to increase developers’ hunger for training data, fueling an even greater race for data acquisition than we have already seen in past decades.
Largely unrestrained data collection poses unique risks to privacy that extend beyond the individual level—they aggregate to pose societal-level harms that cannot be addressed through the exercise of individual data rights alone.
While existing and proposed privacy legislation, grounded in the globally accepted Fair Information Practices (FIPs), implicitly regulate AI development, they are not sufficient to address the data acquisition race as well as the resulting individual and systemic privacy harms.
Even legislation that contains explicit provisions on algorithmic decision-making and other forms of AI does not provide the data governance measures needed to meaningfully regulate the data used in AI systems.
We present three suggestions for how to mitigate the risks to data privacy posed by the development and adoption of AI:

Denormalize data collection by default by shifting away from opt-out to opt-in data collection. Data collectors must facilitate true data minimization through “privacy by default” strategies and adopt technical standards and infrastructure for meaningful consent mechanisms.
Focus on the AI data supply chain to improve privacy and data protection. Ensuring dataset transparency and accountability across the entire life cycle must be a focus of any regulatory system that addresses data privacy.
Flip the script on the creation and management of personal data. Policymakers should support the development of new governance mechanisms and technical infrastructure (e.g., data intermediaries and data permissioning infrastructure) to support and automate the exercise of individual data rights and preferences.

Introduction

Related Publications

Safeguarding Third-Party AI Research

Kevin Klyman, Shayne Longpre, Sayash Kapoor, Rishi Bommasani, Percy Liang, Peter Henderson

Feb 13, 2025

Policy Brief

This brief examines the barriers to independent AI evaluation and proposes safe harbors to protect good-faith third-party research.

Policy Brief

Safeguarding Third-Party AI Research

Kevin Klyman, Shayne Longpre, Sayash Kapoor, Rishi Bommasani, Percy Liang, Peter Henderson

Privacy, Safety, SecurityRegulation, Policy, GovernanceFeb 13

This brief examines the barriers to independent AI evaluation and proposes safe harbors to protect good-faith third-party research.

Assessing the Implementation of Federal AI Leadership and Compliance Mandates

Jennifer Wang, Mirac Suzgun, Caroline Meinhardt, Daniel Zhang, Kazia Nowacki, Daniel E. Ho

Jan 17, 2025

Whitepaper

Whitepaper

Assessing the Implementation of Federal AI Leadership and Compliance Mandates

Jennifer Wang, Mirac Suzgun, Caroline Meinhardt, Daniel Zhang, Kazia Nowacki, Daniel E. Ho

Government, Public AdministrationRegulation, Policy, GovernanceJan 17

What Makes a Good AI Benchmark?

Anka Reuel, Amelia Hardy, Chandler Smith, Max Lamparth, Malcolm Hardy, Mykel Kochenderfer

Dec 11, 2024

Policy Brief

This brief presents a novel assessment framework for evaluating the quality of AI benchmarks and scores 24 benchmarks against the framework.

Policy Brief

What Makes a Good AI Benchmark?

Anka Reuel, Amelia Hardy, Chandler Smith, Max Lamparth, Malcolm Hardy, Mykel Kochenderfer

Foundation ModelsPrivacy, Safety, SecurityDec 11

This brief presents a novel assessment framework for evaluating the quality of AI benchmarks and scores 24 benchmarks against the framework.

Response to U.S. AI Safety Institute’s Request for Comment on Managing Misuse Risk For Dual-Use Foundation Models

Rishi Bommasani, Alexander Wan, Yifan Mai, Percy Liang, Daniel E. Ho

Sep 09, 2024

Response to Request

Stanford HAI aggrees with and supports the U.S. AI Safety Institute’s (US AISI) draft guidelines for improving the safety, security, and trustworthiness of dual-use foundation models.

Response to Request

Response to U.S. AI Safety Institute’s Request for Comment on Managing Misuse Risk For Dual-Use Foundation Models

Rishi Bommasani, Alexander Wan, Yifan Mai, Percy Liang, Daniel E. Ho

Regulation, Policy, GovernanceSep 09

Stanford HAI aggrees with and supports the U.S. AI Safety Institute’s (US AISI) draft guidelines for improving the safety, security, and trustworthiness of dual-use foundation models.

Rethinking Privacy in the AI Era: Policy Provocations for a Data-Centric World

Executive Summary

Introduction

Jennifer King

Caroline Meinhardt

Related Publications

Safeguarding Third-Party AI Research

Safeguarding Third-Party AI Research

Assessing the Implementation of Federal AI Leadership and Compliance Mandates

Assessing the Implementation of Federal AI Leadership and Compliance Mandates

What Makes a Good AI Benchmark?

What Makes a Good AI Benchmark?

Response to U.S. AI Safety Institute’s Request for Comment on Managing Misuse Risk For Dual-Use Foundation Models

Response to U.S. AI Safety Institute’s Request for Comment on Managing Misuse Risk For Dual-Use Foundation Models

Rethinking Privacy in the AI Era: Policy Provocations for a Data-Centric World

Executive Summary

Introduction

Jennifer King

Caroline Meinhardt

Related Publications

Safeguarding Third-Party AI Research

Safeguarding Third-Party AI Research

Assessing the Implementation of Federal AI Leadership and Compliance Mandates

Assessing the Implementation of Federal AI Leadership and Compliance Mandates

What Makes a Good AI Benchmark?

What Makes a Good AI Benchmark?

Response to U.S. AI Safety Institute’s Request for Comment on Managing Misuse Risk For Dual-Use Foundation Models

Response to U.S. AI Safety Institute’s Request for Comment on Managing Misuse Risk For Dual-Use Foundation Models