Chandler Smith

Latest Work

What Makes a Good AI Benchmark?

Malcolm Hardy, Amelia Hardy, Chandler Smith, Max Lamparth, Mykel Kochenderfer, Anka Reuel

Dec 11

policy brief

This brief presents a novel assessment framework for evaluating the quality of AI benchmarks and scores 24 benchmarks against the framework.

Escalation Risks from LLMs in Military and Diplomatic Contexts

Gabriel Mukobi, Jacquelyn Schneider, Juan-Pablo Rivera, Chandler Smith, Max Lamparth, Anka Reuel

May 02

policy brief

In this brief, scholars explain how they designed a wargame simulation to evaluate the escalation risks of large language models (LLMs) in high-stakes military and diplomatic decision-making.

Stay Up To Date

Navigate

Participate

Chandler Smith

What Makes a Good AI Benchmark?

Escalation Risks from LLMs in Military and Diplomatic Contexts