Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.
Sign Up For Latest News
AI coding agents now complete multi-hour coding benchmarks with roughly 50% reliability, yet a randomized trial found experienced open-source developers took about 19% longer when allowed frontier AI tools than when tools were disallowed...
.png&w=1920&q=100)
AI coding agents now complete multi-hour coding benchmarks with roughly 50% reliability, yet a randomized trial found experienced open-source developers took about 19% longer when allowed frontier AI tools than when tools were disallowed...
Child labor remains prevalent in Ghana’s cocoa sector and is associated with adverse educational and health outcomes for children.

Child labor remains prevalent in Ghana’s cocoa sector and is associated with adverse educational and health outcomes for children.
What does digital inclusion look like in the age of AI? Over 6,000 of the world’s 7,000-plus living languages remain digitally disadvantaged.

What does digital inclusion look like in the age of AI? Over 6,000 of the world’s 7,000-plus living languages remain digitally disadvantaged.
Detect, Reject, Correct: Crossmodal Compensation of Corrupted Sensors
DiffImpact: Differentiable Rendering and Identification of Impact Sounds
Learning contact-rich, robotic manipulation skills is a challenging problem due to the high-dimensionality of the state and action space as well as uncertainty from noisy sensors and inaccurate motor control. In this research, Bohg and team explore what representations of raw perceptual data enable a robot to better learn and perform these skills. Specifically for manipulation robots, the sense of touch is essential yet it is non-trivial to manually design a robot controller that combines different sensing modalities that have very different characteristics.
Bohg will present the set of research work that explores the question of how to best fuse the information from vision and touch for contact-rich manipulation tasks. While deep reinforcement learning has shown success in learning control policies for high-dimensional inputs, these algorithms are generally intractable to deploy on real robots due to sample complexity. Bohg and team use self-supervision to learn a compact and multimodal representation of visual and haptic sensory inputs, which can then be used to improve the sample efficiency of policy learning. Bohg presents experiments on a peg insertion task where the learned policy generalizes over different geometry, configurations, and clearances, while being robust to external perturbations. The team also shows how exploiting multiple modalities helps to compensate for corrupted sensory data in one of the modalities.
Another modality that has been under-explored in robotic manipulation is sound. Rigid objects make distinctive sounds during manipulation. These sounds are a function of object features, such as shape and material, and of contact forces during manipulation. Being able to infer from sound an object's acoustic properties, how it is being manipulated, and what events it is participating in could augment and complement what robots can perceive from vision, especially in case of occlusion, low visual resolution, poor lighting, or blurred focus. Bohg will present a fully differentiable model for sounds rigid objects make during impacts, based on physical principles of impact forces, rigid object vibration, and other acoustic effects. Its differentiability enables gradient-based, efficient joint inference of acoustic properties of the objects and characteristics and timings of each individual impact. Bohg will conclude this talk with a discussion of appropriate representations for multimodal sensory data.
Assistant Professor of Computer Science, Stanford University
Professor for Robotics and part of the Stanford AI lab within the Computer Science Department of Stanford University. Bohg is also directing the Interactive Perception and Robot Learning Lab, and enjoys research at the intersection of Robotics, Machine Learning and Computer Vision. No tweets available.