BetaZero: Belief-State Planning for Long-Horizon POMDPs Using Learned Approximations
Real-world planning problems, including autonomous driving and sustainable energy applications like carbon storage and resource exploration, have recently been modeled as partially observable Markov decision processes (POMDPs) and solved using approximate methods. To solve high-dimensional POMDPs in practice, state- of-the-art methods use online planning with problem-specific heuristics to reduce planning horizons and make the problems tractable. Algorithms that learn approximations to replace heuristics have recently found success in large-scale fully observable domains. The key insight is the combination of online Monte Carlo tree search with offline neural network approximations of the optimal policy and value function. In this work, we bring this insight to partially observable domains and propose BetaZero, a belief-state planning algorithm for high-dimensional POMDPs. BetaZero learns offline approximations that replace heuristics to enable online decision making in long-horizon problems. We address several challenges inherent in large-scale partially observable domains; namely challenges of transitioning in stochastic environments, prioritizing action branching with a limited search bud- get, and representing beliefs as input to the network. To formalize the use of all limited search information, we train against a novel Q-weighted visit counts policy. We test BetaZero on various well-established POMDP benchmarks found in the literature and a real-world problem of critical mineral exploration. Experiments show that BetaZero outperforms state-of-the-art POMDP solvers on a variety of tasks.1
Related Publications
Increasingly large robotics datasets are being collected to train larger foundation models in robotics. However, despite the fact that data selection has been of utmost importance to scaling in vision and natural language processing (NLP), little work in robotics has questioned what data such models should actually be trained on. In this work we investigate how to weigh different subsets or "domains'' of robotics datasets during pre-training to maximize worst-case performance across all possible downstream domains using distributionally robust optimization (DRO). Unlike in NLP, we find that these methods are hard to apply out of the box due to varying action spaces and dynamics across robots. Our method, ReMix, employs early stopping and action normalization and discretization to counteract these issues. Through extensive experimentation on both the Bridge and OpenX datasets, we demonstrate that data curation can have an outsized impact on downstream performance. Specifically, domain weights learned by ReMix outperform uniform weights by over 40% on average and human-selected weights by over 20% on datasets used to train the RT-X models.
Increasingly large robotics datasets are being collected to train larger foundation models in robotics. However, despite the fact that data selection has been of utmost importance to scaling in vision and natural language processing (NLP), little work in robotics has questioned what data such models should actually be trained on. In this work we investigate how to weigh different subsets or "domains'' of robotics datasets during pre-training to maximize worst-case performance across all possible downstream domains using distributionally robust optimization (DRO). Unlike in NLP, we find that these methods are hard to apply out of the box due to varying action spaces and dynamics across robots. Our method, ReMix, employs early stopping and action normalization and discretization to counteract these issues. Through extensive experimentation on both the Bridge and OpenX datasets, we demonstrate that data curation can have an outsized impact on downstream performance. Specifically, domain weights learned by ReMix outperform uniform weights by over 40% on average and human-selected weights by over 20% on datasets used to train the RT-X models.
Current societal trends reflect an increased mistrust in science and a lowered civic engagement that threaten to impair research that is foundational for ensuring public health and advancing health equity. One effective countermeasure to these trends lies in community-facing citizen science applications to increase public participation in scientific research, making this field an important target for artificial intelligence (AI) exploration. We highlight potentially promising citizen science AI applications that extend beyond individual use to the community level, including conversational large language models, text-to-image generative AI tools, descriptive analytics for analyzing integrated macro- and micro-level data, and predictive analytics. The novel adaptations of AI technologies for community-engaged participatory research also bring an array of potential risks. We highlight possible negative externalities and mitigations for some of the potential ethical and societal challenges in this field.
Current societal trends reflect an increased mistrust in science and a lowered civic engagement that threaten to impair research that is foundational for ensuring public health and advancing health equity. One effective countermeasure to these trends lies in community-facing citizen science applications to increase public participation in scientific research, making this field an important target for artificial intelligence (AI) exploration. We highlight potentially promising citizen science AI applications that extend beyond individual use to the community level, including conversational large language models, text-to-image generative AI tools, descriptive analytics for analyzing integrated macro- and micro-level data, and predictive analytics. The novel adaptations of AI technologies for community-engaged participatory research also bring an array of potential risks. We highlight possible negative externalities and mitigations for some of the potential ethical and societal challenges in this field.
We present a new method of deconstructing class activation tokens of vision transformers into a new, overcomplete basis, where each basis vector is “monosemantic” and affiliated with a single, human-compatible conceptual description. We achieve this through the use of a highly optimized and customized version of the K-SVD algorithm, which we call Double-Batch K-SVD (DBK-SVD). We demonstrate the efficacy of our approach on the sbucaptions dataset, using CLIP embeddings and comparing our results to a Sparse Autoencoder (SAE) baseline. Our method significantly outperforms SAE in terms of reconstruction loss, recovering approximately 2/3 of the original signal compared to 1/6 for SAE. We introduce novel metrics for evaluating explanation faithfulness and specificity, showing that DBK-SVD produces more diverse and specific concept descriptions. We therefore show empirically for the first time that disentangling of concepts arising in Vision Transformers is possible, a statement that has previously been questioned when applying an additional sparsity constraint. Our research opens new avenues for model interpretability, failure mitigation, and downstream task domain transfer in vision transformer models. An interactive demo showcasing our results can be found at https://disentangling-sbucaptions.xyz, and we make our DBK-SVD implementation openly available at https://github.com/RomeoV/KSVD.jl.
We present a new method of deconstructing class activation tokens of vision transformers into a new, overcomplete basis, where each basis vector is “monosemantic” and affiliated with a single, human-compatible conceptual description. We achieve this through the use of a highly optimized and customized version of the K-SVD algorithm, which we call Double-Batch K-SVD (DBK-SVD). We demonstrate the efficacy of our approach on the sbucaptions dataset, using CLIP embeddings and comparing our results to a Sparse Autoencoder (SAE) baseline. Our method significantly outperforms SAE in terms of reconstruction loss, recovering approximately 2/3 of the original signal compared to 1/6 for SAE. We introduce novel metrics for evaluating explanation faithfulness and specificity, showing that DBK-SVD produces more diverse and specific concept descriptions. We therefore show empirically for the first time that disentangling of concepts arising in Vision Transformers is possible, a statement that has previously been questioned when applying an additional sparsity constraint. Our research opens new avenues for model interpretability, failure mitigation, and downstream task domain transfer in vision transformer models. An interactive demo showcasing our results can be found at https://disentangling-sbucaptions.xyz, and we make our DBK-SVD implementation openly available at https://github.com/RomeoV/KSVD.jl.
Model-based reinforcement learning (MBRL) is a promising route to sampleefficient policy optimization. However, a known vulnerability of reconstructionbased MBRL consists of scenarios in which detailed aspects of the world are highly predictable, but irrelevant to learning a good policy. Such scenarios can lead the model to exhaust its capacity on meaningless content, at the cost of neglecting important environment dynamics. While existing approaches attempt to solve this problem, we highlight its continuing impact on leading MBRL methods —including DreamerV3 and DreamerPro — with a novel environment where background distractions are intricate, predictable, and useless for planning future actions. To address this challenge we develop a method for focusing the capacity of the world model through synergy of a pretrained segmentation model, a task-aware reconstruction loss, and adversarial learning. Our method outperforms a variety of other approaches designed to reduce the impact of distractors, and is an advance towards robust model-based reinforcement learning.
Model-based reinforcement learning (MBRL) is a promising route to sampleefficient policy optimization. However, a known vulnerability of reconstructionbased MBRL consists of scenarios in which detailed aspects of the world are highly predictable, but irrelevant to learning a good policy. Such scenarios can lead the model to exhaust its capacity on meaningless content, at the cost of neglecting important environment dynamics. While existing approaches attempt to solve this problem, we highlight its continuing impact on leading MBRL methods —including DreamerV3 and DreamerPro — with a novel environment where background distractions are intricate, predictable, and useless for planning future actions. To address this challenge we develop a method for focusing the capacity of the world model through synergy of a pretrained segmentation model, a task-aware reconstruction loss, and adversarial learning. Our method outperforms a variety of other approaches designed to reduce the impact of distractors, and is an advance towards robust model-based reinforcement learning.