Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
What is Training Data? | Stanford HAI
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
    • Subscribe to Email
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs

What is Training Data?

Training Data is the collection of examples—such as text, images, audio, or other information—used to teach machine learning models how to perform specific tasks. The model learns by analyzing patterns, relationships, and features within this data, adjusting its internal parameters to make accurate predictions or decisions. The quality, quantity, and diversity of training data largely determine how well an AI system will perform, making it one of the most critical components of machine learning.

Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News


Training Data mentioned at Stanford HAI

Explore Similar Terms:

Supervised Learning | Synthetic Data | Data Augmentation

See Full List of Terms & Definitions

Borrowing from the Law to Filter Training Data for Foundation Models
Katharine Miller
Aug 10
news

Using “Pile of Law,” a dataset of legal materials, Stanford researchers explore filtering private or toxic content from training data for foundation models.

Borrowing from the Law to Filter Training Data for Foundation Models

Katharine Miller
Aug 10

Using “Pile of Law,” a dataset of legal materials, Stanford researchers explore filtering private or toxic content from training data for foundation models.

Machine Learning
news
Toward Fairness in Health Care Training Data
Amit Kaushal, Russ Altman, Curtis Langlotz
Quick ReadOct 01
policy brief
Toward Fairness in Health Care Training Data

This brief highlights the lack of geographic representation in medical-imaging AI training data and calls for nationwide, diversity-focused data-sharing initiatives.

Toward Fairness in Health Care Training Data

Amit Kaushal, Russ Altman, Curtis Langlotz
Quick ReadOct 01

This brief highlights the lack of geographic representation in medical-imaging AI training data and calls for nationwide, diversity-focused data-sharing initiatives.

Healthcare
Ethics, Equity, Inclusion
Toward Fairness in Health Care Training Data
policy brief
AI can be sexist and racist — it’s time to make it fair
James Zou and Londa Schiebinger
Jul 17
news
Your browser does not support the video tag.

Computer scientists must identify sources of bias, de-bias training data and develop artificial-intelligence algorithms that are robust to skews in the data, argue James Zou and Londa Schiebinger in Nature.

AI can be sexist and racist — it’s time to make it fair

James Zou and Londa Schiebinger
Jul 17

Computer scientists must identify sources of bias, de-bias training data and develop artificial-intelligence algorithms that are robust to skews in the data, argue James Zou and Londa Schiebinger in Nature.

Arts, Humanities
Machine Learning
Your browser does not support the video tag.
news
How Bias Hides in ‘Kitchen Sink’ Approaches to Data
Julian Nyarko
Andrew Myers
May 30
news

In risk modeling, AI researchers take a more-is-better approach to training data, but a new study argues that a less-is-more approach may be preferable.

How Bias Hides in ‘Kitchen Sink’ Approaches to Data

Julian Nyarko
Andrew Myers
May 30

In risk modeling, AI researchers take a more-is-better approach to training data, but a new study argues that a less-is-more approach may be preferable.

Natural Language Processing
Machine Learning
news
“Flying in the Dark”: Hospital AI Tools Aren’t Well Documented
Edmund L. Andrews
Aug 23
news

A new study reveals models aren’t reporting enough, leaving users blind to potential model errors such as flawed training data and calibration drift.

“Flying in the Dark”: Hospital AI Tools Aren’t Well Documented

Edmund L. Andrews
Aug 23

A new study reveals models aren’t reporting enough, leaving users blind to potential model errors such as flawed training data and calibration drift.

Healthcare
Machine Learning
news
Improving Equity and Access to Non-English Large Language Models
Prabha Kannan
Apr 22
news

The lessons learned from the fine-tuning and evaluation of Vietnamese LLMs could help broaden access to models beyond English speakers.

Improving Equity and Access to Non-English Large Language Models

Prabha Kannan
Apr 22

The lessons learned from the fine-tuning and evaluation of Vietnamese LLMs could help broaden access to models beyond English speakers.

Natural Language Processing
news
Whose Opinions Do Language Models Reflect?
Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, Tatsunori Hashimoto
Quick ReadSep 20
policy brief

This brief introduces a quantitative framework that allows policymakers to evaluate the behavior of language models to assess what kinds of opinions they reflect.

Whose Opinions Do Language Models Reflect?

Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, Tatsunori Hashimoto
Quick ReadSep 20

This brief introduces a quantitative framework that allows policymakers to evaluate the behavior of language models to assess what kinds of opinions they reflect.

Generative AI
Ethics, Equity, Inclusion
policy brief

Enroll in a Human-Centered AI Course

This AI program covers technical fundamentals, business implications, and societal considerations.