Be Careful What You Tell Your AI Chatbot | Stanford HAI
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs
Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News

Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
news

Be Careful What You Tell Your AI Chatbot

Date
October 15, 2025
Topics
Privacy, Safety, Security
Generative AI
Regulation, Policy, Governance

A Stanford study reveals that leading AI companies are pulling user conversations for training, highlighting privacy risks and a need for clearer policies.

Last month, Anthropic made a quiet change to its terms of service for customers: Conversations you have with its AI chatbot, Claude, will be used for training its large language model by default, unless you opt out. 

Anthropic is not alone in adopting this policy. A recent study of frontier developers’ privacy policies found that six leading U.S. companies feed user inputs back into their models to improve capabilities and win market share. Some give consumers the choice to opt out, while others do not.

Given this trend, should users of AI-powered chat systems worry about their privacy? “Absolutely yes,” says the study’s lead author, Jennifer King, Privacy and Data Policy Fellow at the Stanford Institute for Human-Centered AI. “If you share sensitive information in a dialogue with ChatGPT, Gemini, or other frontier models, it may be collected and used for training, even if it’s in a separate file that you uploaded during the conversation.”

King and her team of Stanford scholars examined AI developers’ privacy policies and identified several causes for concern, including long data retention periods, training on children’s data, and a general lack of transparency and accountability in developers’ privacy practices. In light of these findings, consumers should think twice about the information they share in AI chat conversations and, whenever possible, affirmatively opt out of having their data used for training. 

The History of Privacy Policies

As a communication tool, the internet-era privacy policy that’s now being applied to AI chats is deeply flawed. Typically written in convoluted legal language, these documents are difficult for consumers to read and understand. Yet, we have to agree to them if we want to visit websites, query search engines, and interact with large language models (LLMs).

In the last five years, AI developers have been scraping massive amounts of information from the public internet to train their models, a process that can inadvertently pull personal information into their datasets. “We have hundreds of millions of people interacting with AI chatbots, which are collecting personal data for training, and almost no research has been conducted to examine the privacy practices for these emerging tools,” King explains. In the United States, she adds, privacy protections for personal data collected by or shared with LLM developers are complicated by a patchwork of state-level laws and a lack of federal regulation.

In an effort to help close this research gap, the Stanford team compared the privacy policies of six U.S. companies: Amazon (Nova), Anthropic (Claude), Google (Gemini), Meta (Meta AI), Microsoft (Copilot), and OpenAI (ChatGPT). They analyzed a web of documents for each LLM, including its published privacy policies, linked subpolicies, and associated FAQs and guidance accessible from the chat interfaces, for a total of 28 lengthy documents.

To evaluate these policies, the researchers followed a methodology used by the California Consumer Privacy Act, as it is the most comprehensive privacy law in the United States, and all six frontier developers are required to comply with it. For each company, the researchers analyzed language in the documentation to discern how the stated policies address three questions:

  1. Are user inputs to chatbots used to train or improve LLMs?

  2. What sources and categories of personal consumer data are collected, stored, and processed to train or improve LLMs?

  3. What are the users’ options for opting into or out of having their chats used for training?

Blurred Boundaries

The scholars found all six companies employ users’ chat data by default to train their models, and some developers keep this information in their systems indefinitely. Some, but not all, of the companies state that they de-identify personal information before using it for training purposes. And some developers allow humans to review users’ chat transcripts for model training purposes. 

In the case of multiproduct companies, such as Google, Meta, Microsoft, and Amazon, user interactions also routinely get merged with information gleaned from other products consumers use on those platforms – search queries, sales/purchases, social media engagement, and the like.

These practices can become problematic when, for example, users share personal biometric and health data without considering the implications. Here’s a realistic scenario: Imagine asking an LLM for dinner ideas. Maybe you specify that you want low-sugar or heart-friendly recipes. The chatbot can draw inferences from that input, and the algorithm may decide you fit a classification as a health-vulnerable individual. “This determination drips its way through the developer’s ecosystem. You start seeing ads for medications, and it’s easy to see how this information could end up in the hands of an insurance company. The effects cascade over time,” King explains.

Another red flag the researchers discovered concerns the privacy of children: Developers’ practices vary in this regard, but most are not taking steps to remove children’s input from their data collection and model training processes. Google announced earlier this year that it would train its models on data from teenagers, if they opt in. By contrast, Anthropic says it does not collect children’s data nor allow users under the age of 18 to create accounts, although it does not require age verification. And Microsoft says it collects data from children under 18, but does not use it to build language models. All of these practices raise consent issues, as children cannot legally consent to the collection and use of their data.

Privacy-Preserving AI

Across the board, the Stanford scholars observed that developers’ privacy policies lack essential information about their practices. They recommend policymakers and developers address data privacy challenges posed by LLM-powered chatbots through comprehensive federal privacy regulation, affirmative opt-in for model training, and filtering personal information from chat inputs by default.

“As a society, we need to weigh whether the potential gains in AI capabilities from training on chat data are worth the considerable loss of consumer privacy. And we need to promote innovation in privacy-preserving AI, so that user privacy isn’t an afterthought,” King concludes. 

Share
Link copied to clipboard!
Contributor(s)
Nikki Goth Itoi

Related News

Spatial Intelligence Is AI’s Next Frontier
TIME
Dec 11, 2025
Media Mention

"This is AI’s next frontier, and why 2025 was such a pivotal year," writes HAI Co-Director Fei-Fei Li.

Media Mention
Your browser does not support the video tag.

Spatial Intelligence Is AI’s Next Frontier

TIME
Computer VisionMachine LearningGenerative AIDec 11

"This is AI’s next frontier, and why 2025 was such a pivotal year," writes HAI Co-Director Fei-Fei Li.

Transparency in AI is on the Decline
Rishi Bommasani, Kevin Klyman, Alexander Wan, Percy Liang
Dec 09, 2025
News
Your browser does not support the video tag.

A new study shows the AI industry is withholding key information.

News
Your browser does not support the video tag.

Transparency in AI is on the Decline

Rishi Bommasani, Kevin Klyman, Alexander Wan, Percy Liang
Foundation ModelsRegulation, Policy, GovernancePrivacy, Safety, SecurityDec 09

A new study shows the AI industry is withholding key information.

Stanford Research Teams Receive New Hoffman-Yee Grant Funding for 2025
Nikki Goth Itoi
Dec 09, 2025
News

Five teams will use the funding to advance their work in biology, generative AI and creativity, policing, and more.

News

Stanford Research Teams Receive New Hoffman-Yee Grant Funding for 2025

Nikki Goth Itoi
Arts, HumanitiesEthics, Equity, InclusionFoundation ModelsGenerative AIHealthcareSciences (Social, Health, Biological, Physical)Dec 09

Five teams will use the funding to advance their work in biology, generative AI and creativity, policing, and more.