people
Chandler Smith

Latest Work
policy brief

This brief presents a novel assessment framework for evaluating the quality of AI benchmarks and scores 24 benchmarks against the framework.
policy brief

In this brief, scholars explain how they designed a wargame simulation to evaluate the escalation risks of large language models (LLMs) in high-stakes military and diplomatic decision-making.