DeepFovea: Using deep learning for foveated reconstruction in AR-VR

November 18, 2019

What the research is:

DeepFovea is a new AI-powered foveated rendering system for augmented and virtual reality displays. It renders images using an order of magnitude fewer pixels than previous systems, producing a full-quality experience that is realistic and gaze-contingent.

This is the first practical generative adversarial network (GAN) that is able to generate a natural-looking video sequence conditioned on a very sparse input. In our tests, DeepFovea can decrease the amount of compute resources needed for rendering by as much as 10-14x while any image differences remain imperceptible to the human eye.

We're sharing the full graph structure and materials here.

How it works:

When the human eye looks directly at an object, it sees it in great detail. Peripheral vision, on the other hand, is much lower quality, but because the brain infers the missing information, humans don’t notice. DeepFovea uses recent advances in generative adversarial networks (GANs) that can similarly “in-hallucinate” missing peripheral details by generating content that is perceptually consistent. The system is trained by feeding a large number of video sequences with decreased pixel density as input. The input simulates the peripheral image degradation, and the target helps the network learn how to fill in the missing details based on statistics from all the videos it has seen. The result is a natural-looking video generated out of a stream of sparse pixels that has been decreased in density by as much as 99 percent along the periphery of a 60x40 degree field of view. The system also manages the level of flicker, aliasing, and other video artifacts in the periphery to be below the threshold that can be detected by the human eye. (A sample video is available here.)

Why it matters

High-quality AR and VR experiences require high image resolution, high frame rate, and multiple views, which can be extremely resource-intensive. To advance these systems and bring them to a wider range of devices, such as those with mobile chipsets and small, portable batteries, we will need to dramatically increase rendering efficiency.

DeepFovea shows how deep learning can help accomplish this task via foveated reconstruction. This approach is hardware-agnostic, which makes it a promising tool for potential use in next-gen head-mounted display technologies. As the community explores ways to use eye-tracking in AR and VR, building gaze-contingent technologies like DeepFovea will be particularly useful. The system is one of several research projects we have introduced to improve AR/VR graphics. It follows DeepFocus, which uses AI to address a different challenge: accommodation.