What are Hallucinations (in AI)?

AI on Trial: Legal Models Hallucinate in 1 out of 6 (or More) Benchmarking Queries

Faiz Surani, Daniel E. Ho

May 23

news

A new study reveals the need for benchmarking and public evaluations of AI tools in law.

AI on Trial: Legal Models Hallucinate in 1 out of 6 (or More) Benchmarking Queries

Faiz Surani, Daniel E. Ho

May 23

A new study reveals the need for benchmarking and public evaluations of AI tools in law.

news

Hallucinating Law: Legal Mistakes with Large Language Models are Pervasive

Daniel E. Ho

Matthew Dahl, Varun Magesh, Mirac Suzgun

Jan 11

news

A new study finds disturbing and pervasive errors among three popular models on a wide range of legal tasks.

Hallucinating Law: Legal Mistakes with Large Language Models are Pervasive

Daniel E. Ho

Matthew Dahl, Varun Magesh, Mirac Suzgun

Jan 11

A new study finds disturbing and pervasive errors among three popular models on a wide range of legal tasks.

news

Reduce AI Hallucinations With This Neat Software Trick

WIRED

Jun 14

media mention

Stanford HAI Senior Fellow Dan Ho gives input on how to reduce AI hallucinations and discusses his research into AI legal tools that rely on retrieval augmented generation.

Reduce AI Hallucinations With This Neat Software Trick

WIRED

Jun 14

Stanford HAI Senior Fellow Dan Ho gives input on how to reduce AI hallucinations and discusses his research into AI legal tools that rely on retrieval augmented generation.

Generative AI

Law Enforcement and Justice

media mention

WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia

Sina Semnani, Violet Yao, Monica Lam, Heidi Zhang

Dec 01

Research

This paper presents the first few-shot LLM-based chatbot that almost never hallucinates and has high conversationality and low latency. WikiChat is grounded on the English Wikipedia, the largest curated free-text corpus. WikiChat generates a response from an LLM, retains only the grounded facts, and combines them with additional information it retrieves from the corpus to form factual and engaging responses. We distill WikiChat based on GPT-4 into a 7B-parameter LLaMA model with minimal loss of quality, to significantly improve its latency, cost and privacy, and facilitate research and deployment. Using a novel hybrid human-and-LLM evaluation methodology, we show that our best system achieves 97.3% factual accuracy in simulated conversations. It significantly outperforms all retrieval-based and LLM-based baselines, and by 3.9%, 38.6% and 51.0% on head, tail and recent knowledge compared to GPT-4. Compared to previous state-of-the-art retrieval-based chatbots, WikiChat is also significantly more informative and engaging, just like an LLM. WikiChat achieves 97.9% factual accuracy in conversations with human users about recent topics, 55.0% better than GPT-4, while receiving significantly higher user ratings and more favorable comments.

WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia

Sina Semnani, Violet Yao, Monica Lam, Heidi Zhang

Dec 01

This paper presents the first few-shot LLM-based chatbot that almost never hallucinates and has high conversationality and low latency. WikiChat is grounded on the English Wikipedia, the largest curated free-text corpus. WikiChat generates a response from an LLM, retains only the grounded facts, and combines them with additional information it retrieves from the corpus to form factual and engaging responses. We distill WikiChat based on GPT-4 into a 7B-parameter LLaMA model with minimal loss of quality, to significantly improve its latency, cost and privacy, and facilitate research and deployment. Using a novel hybrid human-and-LLM evaluation methodology, we show that our best system achieves 97.3% factual accuracy in simulated conversations. It significantly outperforms all retrieval-based and LLM-based baselines, and by 3.9%, 38.6% and 51.0% on head, tail and recent knowledge compared to GPT-4. Compared to previous state-of-the-art retrieval-based chatbots, WikiChat is also significantly more informative and engaging, just like an LLM. WikiChat achieves 97.9% factual accuracy in conversations with human users about recent topics, 55.0% better than GPT-4, while receiving significantly higher user ratings and more favorable comments.

Natural Language Processing

Foundation Models

Machine Learning

Generative AI

Research

What are Hallucinations (in AI)?

Navigate

Participate

Stay Up To Date

Hallucinations mentioned at Stanford HAI

AI on Trial: Legal Models Hallucinate in 1 out of 6 (or More) Benchmarking Queries

AI on Trial: Legal Models Hallucinate in 1 out of 6 (or More) Benchmarking Queries

Hallucinating Law: Legal Mistakes with Large Language Models are Pervasive

Hallucinating Law: Legal Mistakes with Large Language Models are Pervasive

Reduce AI Hallucinations With This Neat Software Trick

Reduce AI Hallucinations With This Neat Software Trick

WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia

WikiChat: Stopping the Hallucination of Large Language Model Chatbots by Few-Shot Grounding on Wikipedia

Enroll in a Human-Centered AI Course