Researchers’ neural network creates high-quality 3-D images that could be a game-changer for education, gaming, and remote work.
Imagine hosting a 3-D virtual conference in which your guests are in different cities but appear to be sitting right across the conference table from you. Not only that, your guests can walk around your very real room to chat with colleagues or make notes on your whiteboard.
It’s like the holodeck from Star Trek, right?
That kind of real-time experience has been out of reach, but it may be a bit closer now.
Combining artificial intelligence with the physics of optics, Stanford researchers have created a system that trains itself to create algorithms good enough to instantly reproduce real-world scenes in all their three-dimensional, ever-changing complexity.
Led by Gordon Wetzstein, assistant professor of electrical engineering, the Stanford team has unveiled a neural network that teaches itself the skills it needs by using a “camera in the loop” to evaluate the accuracy of the images it projects and then learn from its errors.
“The big challenge has been that we don’t have algorithms that are good enough to model all the physical aspects of how light propagates in a complex optical system such as AR eyeglasses,” Wetzstein says. “The algorithms we have at the moment are limited in two ways. They’re computationally inefficient, so it takes too long to constantly update the images. And in practice, the images don’t look that good.”
Wetzstein says the new approach makes big advances on both real-time image generation and image quality. In heads-up comparisons, he says, the algorithms developed by their “Holonet” neural network generated clearer and more accurate 3-D images, on the spot, than the traditional holographic software.
That has big practical applications for virtual and augmented reality, well beyond the obvious arenas of gaming and virtual meetings. Real-time holography has tremendous potential for education, training, and remote work. An aircraft mechanic, for example, could learn by exploring the inside of a jet engine thousands of miles away, or a cardiac surgeon could practice a particularly challenging procedure.
In addition to Wetzstein, the system was created by Yifan Peng, a postdoctoral fellow in computer science at Stanford; Suyeon Choi, a PhD student in electrical engineering at Stanford; Nitish Padmanaban, who just completed his PhD in electrical engineering at Stanford; and Jonghyun Kim, a senior research scientist at Nvidia Corp.
Camera in the Loop
As in many other applications of machine learning, the Holonet neural network refines its algorithms by practicing relentlessly on a set of training images and learning from its errors.
The key, says Wetzstein, was to incorporate a real camera into the AI training sessions. The neural network begins by attempting to reproduce a 3-D image and then projecting it onto a display. The digital camera captures that projected image on the display and feeds it back into the system, which can compare the projections against the originals.
Over time, the system gets better and better at creating accurate 3-D images. Eventually, says Wetzstein, it becomes capable of reproducing novel images that it never encountered in the training data.
It sounds straightforward, until you consider the staggering number of potential wave patterns in a three-dimensional image that is constantly changing.
“Think about light waves as the ripples in a pond after a rock hits the water,” Wetzstein suggests. “The pixels on a display are like the rocks in that they control the wave patterns of light that create an image you want to display.” To create a complex image, you’ll have to throw a lot of rocks in just the right way. Now think about how you would throw those rocks to create the light waves that add up to the holographic video of Princess Leia in Star Wars.”
It’s too much data and too many possible permutations for humans to figure out from scratch. But by combining a knowledge of physics with the ability of machine learning to tirelessly search for new patterns, a neural network can develop entirely new algorithms to solve 3-D problems that have vexed experts for decades.
As intriguing as 3-D games and conferences might seem, the researchers say real-time holography also has huge implications for augmented reality. Using AR-equipped eyeglasses, for example, people could turn ordinary physical objects in front of them into virtual keyboards or dashboards that control computers and machinery hundreds of miles away. Holography could even allow musicians to play virtual instruments and collaborate in virtual orchestras.
Wetzstein says the algorithms are within reach, and tech companies are making great strides on miniaturizing the necessary hardware, from eyeglasses that project holograms into a person’s eye to inexpensive 3-D cameras. Indeed, Apple recently incorporated a 3-D scanner into its Pad Pro tablet.
“With this work, we’ve gotten closer to the broader goal of creating visual experiences that are indistinguishable from the real world,” says Wetzstein. “It will be different from how Star Trek envisioned the holodeck, because you won’t need a dedicated room — just a tiny holographic display inside your eyeglasses. But by combining the power of modern neural networks with the techniques of classical physical optics, I think the goal is genuinely achievable.”
Watch a video about the Holonet technology.
Stanford HAI's mission is to advance AI research, education, policy and practice to improve the human condition. Learn more.