What are Hallucinations (in AI)? | Stanford HAI
Stanford
University
  • Stanford Home
  • Maps & Directions
  • Search Stanford
  • Emergency Info
  • Terms of Use
  • Privacy
  • Copyright
  • Trademarks
  • Non-Discrimination
  • Accessibility
© Stanford University.  Stanford, California 94305.
Skip to content
  • About

    • About
    • People
    • Get Involved with HAI
    • Support HAI
    • Subscribe to Email
  • Research

    • Research
    • Fellowship Programs
    • Grants
    • Student Affinity Groups
    • Centers & Labs
    • Research Publications
    • Research Partners
  • Education

    • Education
    • Executive and Professional Education
    • Government and Policymakers
    • K-12
    • Stanford Students
  • Policy

    • Policy
    • Policy Publications
    • Policymaker Education
    • Student Opportunities
  • AI Index

    • AI Index
    • AI Index Report
    • Global Vibrancy Tool
    • People
  • News
  • Events
  • Industry
  • Centers & Labs

What are Hallucinations (in AI)?

Hallucinations in AI refers to instances where an artificial intelligence system generates information or responses that are incorrect, misleading, or entirely fabricated but presented as factual. This often happens in language models or image generation when the AI produces outputs not supported by the training data or real-world facts.

Navigate
  • About
  • Events
  • Careers
  • Search
Participate
  • Get Involved
  • Support HAI
  • Contact Us

Stay Up To Date

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.

Sign Up For Latest News


Hallucinations mentioned at Stanford HAI

Explore Similar Terms:

AI Safety | AI Alignment | Large Language Model (LLM)

See Full List of Terms & Definitions

Enroll in a Human-Centered AI Course

This AI program covers technical fundamentals, business implications, and societal considerations.
AI on Trial: Legal Models Hallucinate in 1 out of 6 (or More) Benchmarking Queries
Faiz Surani, Daniel E. Ho
May 23
news

A new study reveals the need for benchmarking and public evaluations of AI tools in law.

AI on Trial: Legal Models Hallucinate in 1 out of 6 (or More) Benchmarking Queries

Faiz Surani, Daniel E. Ho
May 23

A new study reveals the need for benchmarking and public evaluations of AI tools in law.

news
Hallucinating Law: Legal Mistakes with Large Language Models are Pervasive
Daniel E. Ho
Matthew Dahl, Varun Magesh, Mirac Suzgun
Jan 11
news

A new study finds disturbing and pervasive errors among three popular models on a wide range of legal tasks.

Hallucinating Law: Legal Mistakes with Large Language Models are Pervasive

Daniel E. Ho
Matthew Dahl, Varun Magesh, Mirac Suzgun
Jan 11

A new study finds disturbing and pervasive errors among three popular models on a wide range of legal tasks.

news
Reduce AI Hallucinations With This Neat Software Trick
WIRED
Jun 14
media mention

Stanford HAI Senior Fellow Dan Ho gives input on how to reduce AI hallucinations and discusses his research into AI legal tools that rely on retrieval augmented generation.

Reduce AI Hallucinations With This Neat Software Trick

WIRED
Jun 14

Stanford HAI Senior Fellow Dan Ho gives input on how to reduce AI hallucinations and discusses his research into AI legal tools that rely on retrieval augmented generation.

Generative AI
Law Enforcement and Justice
media mention
WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia
Sina Semnani, Violet Yao, Monica Lam, Heidi Zhang
Dec 01
Research
Your browser does not support the video tag.

This paper presents the first few-shot LLM-based chatbot that almost never hallucinates and has high conversationality and low latency. WikiChat is grounded on the English Wikipedia, the largest curated free-text corpus. WikiChat generates a response from an LLM, retains only the grounded facts, and combines them with additional information it retrieves from the corpus to form factual and engaging responses. We distill WikiChat based on GPT-4 into a 7B-parameter LLaMA model with minimal loss of quality, to significantly improve its latency, cost and privacy, and facilitate research and deployment. Using a novel hybrid human-and-LLM evaluation methodology, we show that our best system achieves 97.3% factual accuracy in simulated conversations. It significantly outperforms all retrieval-based and LLM-based baselines, and by 3.9%, 38.6% and 51.0% on head, tail and recent knowledge compared to GPT-4. Compared to previous state-of-the-art retrieval-based chatbots, WikiChat is also significantly more informative and engaging, just like an LLM. WikiChat achieves 97.9% factual accuracy in conversations with human users about recent topics, 55.0% better than GPT-4, while receiving significantly higher user ratings and more favorable comments.

WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia

Sina Semnani, Violet Yao, Monica Lam, Heidi Zhang
Dec 01

This paper presents the first few-shot LLM-based chatbot that almost never hallucinates and has high conversationality and low latency. WikiChat is grounded on the English Wikipedia, the largest curated free-text corpus. WikiChat generates a response from an LLM, retains only the grounded facts, and combines them with additional information it retrieves from the corpus to form factual and engaging responses. We distill WikiChat based on GPT-4 into a 7B-parameter LLaMA model with minimal loss of quality, to significantly improve its latency, cost and privacy, and facilitate research and deployment. Using a novel hybrid human-and-LLM evaluation methodology, we show that our best system achieves 97.3% factual accuracy in simulated conversations. It significantly outperforms all retrieval-based and LLM-based baselines, and by 3.9%, 38.6% and 51.0% on head, tail and recent knowledge compared to GPT-4. Compared to previous state-of-the-art retrieval-based chatbots, WikiChat is also significantly more informative and engaging, just like an LLM. WikiChat achieves 97.9% factual accuracy in conversations with human users about recent topics, 55.0% better than GPT-4, while receiving significantly higher user ratings and more favorable comments.

Natural Language Processing
Foundation Models
Machine Learning
Generative AI
Your browser does not support the video tag.
Research