HAI Weekly Seminar with Jeannette Bohg
On the Role of Vision, Touch and Sound for Robustness and Generalisability in Robotic Manipulation
Learning contact-rich, robotic manipulation skills is a challenging problem due to the high-dimensionality of the state and action space as well as uncertainty from noisy sensors and inaccurate motor control. In this research, Bohg and team explore what representations of raw perceptual data enable a robot to better learn and perform these skills. Specifically for manipulation robots, the sense of touch is essential yet it is non-trivial to manually design a robot controller that combines different sensing modalities that have very different characteristics.
Bohg will present the set of research work that explores the question of how to best fuse the information from vision and touch for contact-rich manipulation tasks. While deep reinforcement learning has shown success in learning control policies for high-dimensional inputs, these algorithms are generally intractable to deploy on real robots due to sample complexity. Bohg and team use self-supervision to learn a compact and multimodal representation of visual and haptic sensory inputs, which can then be used to improve the sample efficiency of policy learning. Bohg presents experiments on a peg insertion task where the learned policy generalizes over different geometry, configurations, and clearances, while being robust to external perturbations. The team also shows how exploiting multiple modalities helps to compensate for corrupted sensory data in one of the modalities.
Another modality that has been under-explored in robotic manipulation is sound. Rigid objects make distinctive sounds during manipulation. These sounds are a function of object features, such as shape and material, and of contact forces during manipulation. Being able to infer from sound an object's acoustic properties, how it is being manipulated, and what events it is participating in could augment and complement what robots can perceive from vision, especially in case of occlusion, low visual resolution, poor lighting, or blurred focus. Bohg will present a fully differentiable model for sounds rigid objects make during impacts, based on physical principles of impact forces, rigid object vibration, and other acoustic effects. Its differentiability enables gradient-based, efficient joint inference of acoustic properties of the objects and characteristics and timings of each individual impact. Bohg will conclude this talk with a discussion of appropriate representations for multimodal sensory data.
Assistant Professor of Computer Science, Stanford University