AI enables robots to perform increasingly complex tasks across sectors, from manufacturing to healthcare.
2025 Spring Conference
Stanford’s Vocal Sandbox brings us closer to robots that can adapt, learn, and assist in real time.
Stanford’s Vocal Sandbox brings us closer to robots that can adapt, learn, and assist in real time.
Robots are becoming a core building block in engineering and healthcare applications, altering the way many industries operate, and improving quality of life for everyone. With AI, robots are further given the ability to learn and adapt so that they can work collaboratively alongside humans and other robots in real-world environments. This industry brief provides a cross-section of key research – at HAI and across Stanford – that leverages AI methods into new algorithms for human robot interaction and robot navigation. Discover how researchers are designing intelligent robots that learn and adapt to human demonstration, and how they could be used to disrupt and create markets in a wide range of industries including manufacturing, healthcare, autonomous vehicles, and many more.
Robots are becoming a core building block in engineering and healthcare applications, altering the way many industries operate, and improving quality of life for everyone. With AI, robots are further given the ability to learn and adapt so that they can work collaboratively alongside humans and other robots in real-world environments. This industry brief provides a cross-section of key research – at HAI and across Stanford – that leverages AI methods into new algorithms for human robot interaction and robot navigation. Discover how researchers are designing intelligent robots that learn and adapt to human demonstration, and how they could be used to disrupt and create markets in a wide range of industries including manufacturing, healthcare, autonomous vehicles, and many more.
Increasingly large robotics datasets are being collected to train larger foundation models in robotics. However, despite the fact that data selection has been of utmost importance to scaling in vision and natural language processing (NLP), little work in robotics has questioned what data such models should actually be trained on. In this work we investigate how to weigh different subsets or "domains'' of robotics datasets during pre-training to maximize worst-case performance across all possible downstream domains using distributionally robust optimization (DRO). Unlike in NLP, we find that these methods are hard to apply out of the box due to varying action spaces and dynamics across robots. Our method, ReMix, employs early stopping and action normalization and discretization to counteract these issues. Through extensive experimentation on both the Bridge and OpenX datasets, we demonstrate that data curation can have an outsized impact on downstream performance. Specifically, domain weights learned by ReMix outperform uniform weights by over 40% on average and human-selected weights by over 20% on datasets used to train the RT-X models.
Increasingly large robotics datasets are being collected to train larger foundation models in robotics. However, despite the fact that data selection has been of utmost importance to scaling in vision and natural language processing (NLP), little work in robotics has questioned what data such models should actually be trained on. In this work we investigate how to weigh different subsets or "domains'' of robotics datasets during pre-training to maximize worst-case performance across all possible downstream domains using distributionally robust optimization (DRO). Unlike in NLP, we find that these methods are hard to apply out of the box due to varying action spaces and dynamics across robots. Our method, ReMix, employs early stopping and action normalization and discretization to counteract these issues. Through extensive experimentation on both the Bridge and OpenX datasets, we demonstrate that data curation can have an outsized impact on downstream performance. Specifically, domain weights learned by ReMix outperform uniform weights by over 40% on average and human-selected weights by over 20% on datasets used to train the RT-X models.
Stanford HAI co-director Fei-Fei Li says the next frontier in AI lies in advancing spatial intelligence. In this op-ed, she explains how enabling machines to perceive and interact with the world in 3D can unlock human-centered AI applications for robotics, healthcare, education, and beyond.
Stanford HAI co-director Fei-Fei Li says the next frontier in AI lies in advancing spatial intelligence. In this op-ed, she explains how enabling machines to perceive and interact with the world in 3D can unlock human-centered AI applications for robotics, healthcare, education, and beyond.
2025 Spring Conference
2025 Spring Conference
Stanford’s Vocal Sandbox brings us closer to robots that can adapt, learn, and assist in real time.
Stanford’s Vocal Sandbox brings us closer to robots that can adapt, learn, and assist in real time.
Robots are becoming a core building block in engineering and healthcare applications, altering the way many industries operate, and improving quality of life for everyone. With AI, robots are further given the ability to learn and adapt so that they can work collaboratively alongside humans and other robots in real-world environments. This industry brief provides a cross-section of key research – at HAI and across Stanford – that leverages AI methods into new algorithms for human robot interaction and robot navigation. Discover how researchers are designing intelligent robots that learn and adapt to human demonstration, and how they could be used to disrupt and create markets in a wide range of industries including manufacturing, healthcare, autonomous vehicles, and many more.
Robots are becoming a core building block in engineering and healthcare applications, altering the way many industries operate, and improving quality of life for everyone. With AI, robots are further given the ability to learn and adapt so that they can work collaboratively alongside humans and other robots in real-world environments. This industry brief provides a cross-section of key research – at HAI and across Stanford – that leverages AI methods into new algorithms for human robot interaction and robot navigation. Discover how researchers are designing intelligent robots that learn and adapt to human demonstration, and how they could be used to disrupt and create markets in a wide range of industries including manufacturing, healthcare, autonomous vehicles, and many more.
Increasingly large robotics datasets are being collected to train larger foundation models in robotics. However, despite the fact that data selection has been of utmost importance to scaling in vision and natural language processing (NLP), little work in robotics has questioned what data such models should actually be trained on. In this work we investigate how to weigh different subsets or "domains'' of robotics datasets during pre-training to maximize worst-case performance across all possible downstream domains using distributionally robust optimization (DRO). Unlike in NLP, we find that these methods are hard to apply out of the box due to varying action spaces and dynamics across robots. Our method, ReMix, employs early stopping and action normalization and discretization to counteract these issues. Through extensive experimentation on both the Bridge and OpenX datasets, we demonstrate that data curation can have an outsized impact on downstream performance. Specifically, domain weights learned by ReMix outperform uniform weights by over 40% on average and human-selected weights by over 20% on datasets used to train the RT-X models.
Increasingly large robotics datasets are being collected to train larger foundation models in robotics. However, despite the fact that data selection has been of utmost importance to scaling in vision and natural language processing (NLP), little work in robotics has questioned what data such models should actually be trained on. In this work we investigate how to weigh different subsets or "domains'' of robotics datasets during pre-training to maximize worst-case performance across all possible downstream domains using distributionally robust optimization (DRO). Unlike in NLP, we find that these methods are hard to apply out of the box due to varying action spaces and dynamics across robots. Our method, ReMix, employs early stopping and action normalization and discretization to counteract these issues. Through extensive experimentation on both the Bridge and OpenX datasets, we demonstrate that data curation can have an outsized impact on downstream performance. Specifically, domain weights learned by ReMix outperform uniform weights by over 40% on average and human-selected weights by over 20% on datasets used to train the RT-X models.
Stanford HAI co-director Fei-Fei Li says the next frontier in AI lies in advancing spatial intelligence. In this op-ed, she explains how enabling machines to perceive and interact with the world in 3D can unlock human-centered AI applications for robotics, healthcare, education, and beyond.
Stanford HAI co-director Fei-Fei Li says the next frontier in AI lies in advancing spatial intelligence. In this op-ed, she explains how enabling machines to perceive and interact with the world in 3D can unlock human-centered AI applications for robotics, healthcare, education, and beyond.