BetaZero: Belief-State Planning for Long-Horizon POMDPs Using Learned Approximations

Date

July 31, 2024

Topics

abstract

Real-world planning problems, including autonomous driving and sustainable energy applications like carbon storage and resource exploration, have recently been modeled as partially observable Markov decision processes (POMDPs) and solved using approximate methods. To solve high-dimensional POMDPs in practice, state- of-the-art methods use online planning with problem-specific heuristics to reduce planning horizons and make the problems tractable. Algorithms that learn approximations to replace heuristics have recently found success in large-scale fully observable domains. The key insight is the combination of online Monte Carlo tree search with offline neural network approximations of the optimal policy and value function. In this work, we bring this insight to partially observable domains and propose BetaZero, a belief-state planning algorithm for high-dimensional POMDPs. BetaZero learns offline approximations that replace heuristics to enable online decision making in long-horizon problems. We address several challenges inherent in large-scale partially observable domains; namely challenges of transitioning in stochastic environments, prioritizing action branching with a limited search bud- get, and representing beliefs as input to the network. To formalize the use of all limited search information, we train against a novel Q-weighted visit counts policy. We test BetaZero on various well-established POMDP benchmarks found in the literature and a real-world problem of critical mineral exploration. Experiments show that BetaZero outperforms state-of-the-art POMDP solvers on a variety of tasks.1

Related Publications

ReMix: Optimizing Data Mixtures for Large Scale Imitation Learning

Joey Hejna, Chethan Anand Bhateja, Yichen Jiang, Karl Pertsch, Dorsa Sadigh

Sep 05, 2024

Research

Increasingly large robotics datasets are being collected to train larger foundation models in robotics. However, despite the fact that data selection has been of utmost importance to scaling in vision and natural language processing (NLP), little work in robotics has questioned what data such models should actually be trained on. In this work we investigate how to weigh different subsets or "domains'' of robotics datasets during pre-training to maximize worst-case performance across all possible downstream domains using distributionally robust optimization (DRO). Unlike in NLP, we find that these methods are hard to apply out of the box due to varying action spaces and dynamics across robots. Our method, ReMix, employs early stopping and action normalization and discretization to counteract these issues. Through extensive experimentation on both the Bridge and OpenX datasets, we demonstrate that data curation can have an outsized impact on downstream performance. Specifically, domain weights learned by ReMix outperform uniform weights by over 40% on average and human-selected weights by over 20% on datasets used to train the RT-X models.

Research

ReMix: Optimizing Data Mixtures for Large Scale Imitation Learning

Joey Hejna, Chethan Anand Bhateja, Yichen Jiang, Karl Pertsch, Dorsa Sadigh

Computer VisionRoboticsNatural Language ProcessingSep 05

Utah's Experiment With AI-Driven Prescription Renewals

Michelle Mello

Quick ReadMar 19, 2026

Research

In January 2026, Utah announced a first-of-its kind pilot program allowing an autonomous artificial intelligence (AI) agent to renew prescriptions for consumers who request it. The state agreed not to enforce its unprofessional conduct laws against the developer, Doctronic, if the company adheres to a contract that includes safety and privacy protections. The pilot program includes 192 drugs for chronic conditions. Although physicians will initially validate the AI’s actions, the pilot program will swiftly become one of the first deployments at scale of an autonomous, agentic system in medicine. The announcement prompted concern from associations of physicians and pharmacists who opined that AI “should NOT be making care decisions.”

Research

Utah's Experiment With AI-Driven Prescription Renewals

Michelle Mello

HealthcareRegulation, Policy, GovernanceQuick ReadMar 19

The AI Arms Race In Health Insurance Utilization Review: Promises Of Efficiency And Risks Of Supercharged Flaws

Michelle Mello, Artem Trotsyuk, Abdoul Jalil Djiberou Mahamadou, Danton Char

Quick ReadJan 06, 2026

Research

Health insurers and health care provider organizations are increasingly using artificial intelligence (AI) tools in prior authorization and claims processes. AI offers many potential benefits, but its adoption has raised concerns about the role of the “humans in the loop,” users’ understanding of AI, opacity of algorithmic determinations, underperformance in certain tasks, automation bias, and unintended social consequences. To date, institutional governance by insurers and providers has not fully met the challenge of ensuring responsible use. However, several steps could be taken to help realize the benefits of AI use while minimizing risks. Drawing on empirical work on AI use and our own ethical assessments of provider-facing tools as part of the AI governance process at Stanford Health Care, we examine why utilization review has attracted so much AI innovation and why it is challenging to ensure responsible use of AI. We conclude with several steps that could be taken to help realize the benefits of AI use while minimizing risks.

Research

The AI Arms Race In Health Insurance Utilization Review: Promises Of Efficiency And Risks Of Supercharged Flaws

Michelle Mello, Artem Trotsyuk, Abdoul Jalil Djiberou Mahamadou, Danton Char

HealthcareRegulation, Policy, GovernanceQuick ReadJan 06

The Global AI Vibrancy Tool 2025

Loredana Fattorini, Nestor Maslej, Ray Perrault, Vanessa Parli, John Etchemendy, Yoav Shoham, Katrina Ligett

Deep DiveNov 24, 2025

Research

This methodological paper presents the Global AI Vibrancy Tool, an interactive suite of visualizations designed to facilitate cross-country comparisons of AI vibrancy across countries, using indicators organized into pillars. The tool offers customizable features that enable users to conduct in-depth country-level comparisons and longitudinal analyses of AI-related metrics.

Research