HAI Weekly Seminar with Jiajun Wu
Learning to See the Physical World
Get the latest news, advances in research, policy work, and education program updates from HAI in your inbox weekly.
Sign Up For Latest News
Learning to See the Physical World
What does digital inclusion look like in the age of AI? Over 6,000 of the world’s 7,000-plus living languages remain digitally disadvantaged.

What does digital inclusion look like in the age of AI? Over 6,000 of the world’s 7,000-plus living languages remain digitally disadvantaged.
While Large Language Models (LLMs) show promise in many domains, relying on them for direct policy generation in games often results in illegal moves and poor strategic play.

While Large Language Models (LLMs) show promise in many domains, relying on them for direct policy generation in games often results in illegal moves and poor strategic play.
How do AI agents influence knowledge work? This paper finds that agents shift worker effort from implementation to supervision, which especially benefits verifiable work and expert workers. I use data from the coding platform Cursor to study agents in software production.
.png&w=1920&q=100)
How do AI agents influence knowledge work? This paper finds that agents shift worker effort from implementation to supervision, which especially benefits verifiable work and expert workers. I use data from the coding platform Cursor to study agents in software production.
Human intelligence is beyond pattern recognition. From a single image, we're able to explain what we see, reconstruct the scene in 3D, predict what's going to happen, and plan our actions accordingly. In this talk, I will present our recent work on physical scene understanding---building versatile, data-efficient, and generalizable machines that learn to see, reason about, and interact with the physical world. The core idea is to exploit the generic, causal structure behind the world, including knowledge from computer graphics, physics, and language, in the form of approximate simulation engines, and to integrate them with deep learning. Here, deep learning plays two major roles: first, it learns to invert simulation engines for efficient inference; second, it learns to augment simulation engines for constructing powerful forward models. I'll focus on a few topics to demonstrate this idea: building scene representation for both object geometry and physics; learning expressive dynamics models for planning and control; perception and reasoning beyond vision.