It’s relatively easy to build a decent eye tracker that works for most people in most situations. At a basic level, all you need is a camera, a light source, and a processing unit. The light illuminates the person’s eyes, increasing the contrast between the pupil and the iris and creating reflections on the cornea. The camera takes images of a person’s eyes, and the processing unit finds the pupil and these reflections in the cornea. With this information, the known positions of the camera and light source, and the anatomy of the human eye, it is possible to calculate the position and angle of rotation of each eye. Calibrate the eye tracking system by asking the user to look at an object whose position is known, and you have everything you need to determine where a person is looking.
However, each new use case presents new challenges and I wish that there was some kind of secret formula that solved everything but unfortunately, there isn’t. It requires hard and dedicated work to turn a basic eye tracking system into something reliable.
To start with, we usually need to generate massive datasets. We need to know what information to look for and how to slice the data for the target application. A research scenario, for example, doesn’t necessitate the same challenging population coverage requirements as a device-native feature in a mass-market product — such as foveated rendering in a VR headset.
And then there's the issue of latency. A graphics-heavy application that uses split rendering, performing some computing on the device and some in the cloud, for example, requires a low latency connection both with the network and with the eye tracker. On the other hand, an application that supports eye-controlled menu selection won’t have the same latency requirements, which allows for quite some temporal filtering to enhance the user experience.
Some might argue that eye tracking is a pure computer science problem, and that machine learning will solve everything for you. And although machine learning is a vital part of our solution, when designing eye tracking algorithms, you need to consider the anatomy of the eye, how the brain interprets visual signals, as well as the goals of the target application.
But I think the biggest struggle comes when you move from ideation to commercialization. Failure is not an option for a mass-market scenario where millions of devices rely on your technology to be fully functional. Reaching 99% population coverage and beyond means that scenarios and persons that were considered outliers during ideation now need to be solved for. Droopy eyelids, make-up covering vital features, prescription glasses, contact lenses, and lazy/dominant eyes are all typical. In addition, you will likely need to manage headset slippage, as well as variations in interpupillary distance (IPD), face shape, skin reflectance in near-infrared, iris color, and component and placement tolerances.