Wolfgang Lehrach | Code World Models for General Game Playing
While Large Language Models (LLMs) show promise in many domains, relying on them for direct policy generation in games often results in illegal moves and poor strategic play.
Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.
Sign Up For Latest News
While Large Language Models (LLMs) show promise in many domains, relying on them for direct policy generation in games often results in illegal moves and poor strategic play.
Sequence data is ubiquitous in economics — job histories in labor economics, diagnosis and treatment sequences in health economics, strategic interactions in game theory. Generative sequence models can learn to predict these sequences well, but their complexity makes it hard to extract interpretable economic insights from their predictions.
.png&w=1920&q=100)
Sequence data is ubiquitous in economics — job histories in labor economics, diagnosis and treatment sequences in health economics, strategic interactions in game theory. Generative sequence models can learn to predict these sequences well, but their complexity makes it hard to extract interpretable economic insights from their predictions.
What does digital inclusion look like in the age of AI? Over 6,000 of the world’s 7,000-plus living languages remain digitally disadvantaged.

What does digital inclusion look like in the age of AI? Over 6,000 of the world’s 7,000-plus living languages remain digitally disadvantaged.
Systems like ChatGPT and Claude assist billions through proactive dialogue—offering unsolicited, task-relevant information. Drawing on Cognitive Load Theory, we study how cognitive load shapes performance in AI assisted knowledge work.
.png&w=1920&q=100)
Systems like ChatGPT and Claude assist billions through proactive dialogue—offering unsolicited, task-relevant information. Drawing on Cognitive Load Theory, we study how cognitive load shapes performance in AI assisted knowledge work.
In this talk, I present an approach that moves away from direct prompting, instead using LLMs as program synthesizers to bridge the gap between natural language rules and symbolic world models. The LLM receives a game description and example trajectories, and outputs an executable, symbolic world model (CWM) represented in Python. The trajectories also ensure the rules are correctly captured and aid in refining the CWM if they are not. Note that even trajectories containing only a single player's observations and actions can be used to help validate and refine CWMs. Furthermore, partially observed trajectories also allow comparisons between CWMs via a bound on the likelihood.
Given a CWM, Monte Carlo Tree Search (MCTS) or Reinforcement Learning (RL) methods can play the game, and gameplay can be further enhanced by adding in LLM-derived synthesized value functions. Imperfect information games are handled by having the LLM synthesize inference functions to impute information sets, or by directly training reinforcement learning policies on top of the CWM.