people
Max Lamparth

Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.
Sign Up For Latest News
In this brief, scholars explain how they designed a wargame simulation to evaluate the escalation risks of large language models (LLMs) in high-stakes military and diplomatic decision-making.
This brief presents a novel assessment framework for evaluating the quality of AI benchmarks and scores 24 benchmarks against the framework.