For several years, manufacturers of VR headsets have used the term "Foveated Rendering" to emphasize the advanced nature of their devices. Research on this type of image and scene creation with different degrees of sharpness and detail in the user's field of vision has been going on for over 30 years. But only in recent years does the technology seem to have become powerful enough to enable foveated rendering - on the HTC Vive Pro Eye, on the Oculus Quest / Meta Quest, on the Sony PlayStation VR 2 (PSVR2) as well as on the
apple vision pro
presented on June 5, 2023
became. But what exactly is foveated rendering?
What is foveated rendering? What are the differences between the static and the dynamic application? And how does it all work in the Apple Vision Pro headset? Here are answers to these questions. The Term: What is Foveated Rendering?
Foveated rendering describes the image synthesis for VR and AR applications on appropriate headsets, taking into account the direction of view, so that image content in focus is displayed sharper or more detailed and content viewed only peripherally is less sharp / less detailed. Or to put it more simply: If you use a virtual reality headset with foveated rendering, only the viewed parts of the display content are displayed in the highest resolution. What is not viewed directly is not rendered at full resolution and with the most detailed textures.
From "Foveated rendering: A state-of-the-art survey" by Lili Wang, Xuenhuai Shi and Yi Liu. The link to the study is at the bottom of this post.
The advantage is that the highest image quality can be used in the direct line of sight, while the edges of the image require less computing power. However, this poses several challenges for technical research, the manufacturing companies, the developers of the software and games used and, last but not least, the devices in use. Because eye movements are fast, the reaction time of the brain is also rapid and the VR headset is only as good as the technology that is built into it.
Where does the name "Foveated Rendering" come from?
The first part of the term, "Foveated," refers to the fovea centralis in the human eye. This Latin term means something like "central pit" and describes the so-called visual pit on the retina of the eye. This is the area of sharpest vision (foveal vision). The diameter of the fovea centralis measures around 1,5 mm and there are around 147.000 light receptors (cones) per square millimeter, mainly M and L cones for the green and red range of visible light, fewer S cones for blue light .
The fovea centralis is located on the temporal side next to the optic nerve of the eye. Figure by Hans-Werner Hunziker under CC BY-SA 3.0 license at Wikimedia ( here ).
The second part of the term, namely "rendering", refers to the computer-assisted generation of images from raw data. In operating systems, software with a graphical interface, video games, apps and the like, data, commands, coordination and other information are used to generate an image output that is then displayed to the user on the output device. In the generation of individual image files, the speed of the rendering process is important. In the case of moving image and multimedia content, these are repeated several times in a short space of time. The
specifies how many frames per second are calculated and output. As a complete term, "Foveated Rendering" describes the creation of computer graphics or virtual scenes that are only fully and sharply calculated in the direct viewing direction. Anything outside of the straight line to the fovea centralis will be rendered less sharp and/or with less elaborate textures. The rendering process can thus be accelerated and uses fewer resources, while the users do not have to accept any (noticeable) reduction in image quality. So e.g. For example, VR games can be displayed in 4K resolution in the field of view, while a lower resolution is used at the edges of the display. So an interplay of foveal and peripheral vision.
The Technique: How Does Foveated Rendering Work?
There are different approaches to the realization of this technique, depending on the type of foveated rendering to be offered. The most convenient and natural for users is Dynamic Foveated Rendering, in which the direction of gaze is determined by eye tracking and the sharp image area is repositioned accordingly. However, this is the most technically complex implementation. Fixed or Static Foveated Rendering assumes that the user is only looking straight ahead or at another fixed point, which is why only the same point is rendered completely sharp and the sharpness or level of detail decreases towards the edges of the image.
Without Eye Tracking: Static / Fixed Foveated Rendering
The advantage of fixed foveated rendering is that the virtual reality headset does not have to include a new eye position and thus a new viewing direction as the data basis each time it calculates individual images. This means that there is less computing effort for the input evaluation and the processors are less heavily used for this. In this way, less demanding technology can be used (seen in comparison) and/or power consumption can be reduced. Furthermore, manufacturers can save on the technology for eye tracking and the device becomes cheaper. The disadvantage, of course, is that the virtual reality headset either only displays the center of the image or image content rated as interesting in the highest resolution. So, in a video game, it could be that only the game character and its immediate surroundings are fully rendered, while areas further away from them appear blurry. Here the developers have to rely on game tests and the associated measurements of the direction of view, it is not possible to react ad hoc to where the players are really looking.
With Eye Tracking: Dynamic Foveated Rendering
The advantage of Dynamic Foveated Rendering / Dynamically Foveated Rendering is that the direction of view of the user is included in the image calculation in the VR headset. If you look to the left in the virtual scene, the left sides of the display are sharp, the center is less sharp and the right display areas have the least resolution. If you look to the right, the right sides of the display will have a higher resolution, and so on. This technique allows for more natural usage and doesn't require developers to anticipate the gaze interest in the application. The disadvantage is that the VR glasses have to take into account the user's line of sight for every image calculation. This has to be done extremely quickly because you can focus on a wide variety of objects in a wide variety of directions in a short amount of time. The eye movements, the rapid registration of image content and the unpredictable change of mind when selecting important scene content pose a challenge here. The registration of eye movement (eye tracking), the calculation of the field of view and the corresponding image synthesis must take place in a small fraction of a second . This requires the latest technology and optimized processes.
Video: Foveated Rendering Demo in just 45 seconds
shows how foveated rendering can work in combination with deep learning. For this, the viewed object was given a higher pixel density and 95% of the pixels were removed from the rest of the image. In this way, the eventual dissolution of the retina of the eye was simulated. The now missing pixels were then replaced by an AI, which leads to more and more abstract forms with increasing distance to the viewed object, but makes no difference overall, since the distant content is only seen peripherally and is therefore not perceived in detail by the brain. https://www.youtube.com/watch?v=NPK8eQ4o8Pk
The challenge: reaction to eye movements under 13 milliseconds
by Mary C. Potter, Brad Wyble, Carl Erick Hagmann and Emily S. McCourt, whose results were published in 2013, show that people can capture new image content - or at least the main content of the images - within 13 ms. In detail, the study "Detecting meaning in RSVP at 13 ms per picture" published in Attention, Perception, & Psychophysics, Volume 76, Issue 2 in February 2014 states:
The results of both experiments show that conceptual understanding can be achieved when a novel picture is presented as briefly as 13 ms and masked by other pictures. Even when participants were not given the target name until after they had viewed the entire sequence of six or 12 pictures, their performance was above chance even at 13 ms [...]
The challenge for modern virtual reality headsets with dynamic foveated rendering is to calculate detailed and blurred image areas in a cycle of 0,013 seconds. No wonder, then, that the development of the Apple Vision Pro headset took so long and that besides the
the new R1 chip was installed, which is made exclusively for the interpretation of sensor data. In the
the Apple Vision Pro says accordingly:
while the all-new R1 chip processes input from twelve cameras, five sensors and six microphones, ensuring content feels as if it's happening in real-time in front of the user's eyes. R1 transmits new images to the displays within 12 milliseconds [...]
The Apple Vision Pro headset has LEDs and infrared cameras on the inside that measure eye movements. The R1 chip evaluates the data determined in this way so that the image can be generated on the M2 chip as quickly as possible. How the technology is integrated into the Apple Vision Pro headset
According to Apple, Dynamically Foveated Rendering is part of the visionOS operating system. Developers can use it to customize their content using Apple's Xcode and Reality Composer Pro tools. Unity can also be used natively under visionOS for certain apps and games. The Unity game engine supplements the visionOS SDK (SDK = Software Development Kit), the RealityKit, the UIKit and the aforementioned Apple offers for programming, AR design, VR applications and the like.
On the developer page for the topic (
) it says accordingly:
Now, you can use Unity's robust, familiar authoring tools to create new apps and games or reimagine your existing Unity-created projects for visionOS. Your apps get access to all the benefits of visionOS, like passthrough and Dynamically Foveated Rendering, in addition to familiar Unity features like AR Foundation. By combining Unity's authoring and simulation capabilities with RealityKit-managed app rendering, content created with Unity looks and feels at home on visionOS.
In addition to the PlayStation VR 2 headset (PSVR 2), the Meta Quest and similar devices, there is now also the Apple Vision Pro as a VR and AR headset. The manufacturer points out on its developer page that Dynamically Foveated Rendering is used for the visionOS system and the programs running on it. Summary: Foveated Rendering in Virtual Reality Headsets
With regard to image content of virtual reality (VR) and in some cases also augmented reality (AR), foveated rendering describes a concentration of computing processes on the user's field of vision, in which foveal vision plays a role. Resources can be saved in areas of peripheral vision, i.e. for content that is literally only perceived marginally, since the resolution is not as high and the textures are not as large. In dynamic foveated rendering in particular, the challenge consists in reacting to eye movements in the shortest possible time (less than or in 13 milliseconds) and outputting appropriately adapted image content.
Sources for your own research
Below is a list of sources I used in researching this post. These supplement the content already linked in the article, which can also be viewed as sources:
English Wikipedia article on the subject:
View here German Wikipedia article on the fovea centralis:
View here State-of-the-art study on foveated rendering from 2022, by Lili Wang, Xuehuai Shi & Yi Liu (to be published in early 2023):