Building for UX: Connecting eye gaze to UI objects

Learn article

Building for UX: Connecting eye gaze to UI objects

You've decided the time has come to try user input (UI) with eye tracking. Maybe you've seen eye tracking technology in products from Sony, Meta, or Apple. Maybe you've thought "wouldn't it be awesome if…" while imagining application control and object interaction with eye movements. After all, nothing could be quicker than shooting a glance to choose an item.

Once you start experimenting with gaze input, it will become clear that it's not like designing for a mouse or touchscreen. Your eyes are always moving. Even when they rest (i.e. fixation) there are small, involuntary movements which you can learn about in Eye Movement: Types and functions explained.

Plus, there is always some uncertainty around where the user is actually looking versus where the eye tracker reports the user is looking. The amount of error is both a function of the eye tracking hardware and the person being tracked. These characteristics of eye movement and input signal quality add unique challenges to the creation of gaze-driven interfaces.

In this article, we'll learn how the basic UI concept of pointing requires special handling when creating interfaces with eye-based input.

What is the user looking at?

The basic function of an eye tracker is to tell the system where the user is looking. In general, that information is determined by a vector in space originating from the eye, the gaze vector. The user's gaze might be provided to applications as a point on screen, the gaze position. When the gaze vector aligns with an interactive item, it becomes focused.

The simplest implementation of gaze control for a screen-based UI would be to use gaze position in place of mouse position, then add some mechanism for activation such as a gesture or button press. The simplest version of gaze control in 3D would be to ray cast from the gaze vector instead of from the controller or hand. While the approach is simple, there are reasons why eye gaze is not a drop-in substitute for hand-based pointer input.

Let's look at some important differences:

How is eye gaze different from mousing or touch?

	Eye gaze	Mouse pointer	Touch
Human targeting resolution	0.1 to 1 degree (1 to 10's of pixels)	0.1 degree (single pixels)	1 degree (10's of pixels)
Input resolution (accuracy)	1 to 10 degrees (10 to 100s of pixels)	0.1 degree (single pixels)	1 degree (10's of pixels)
Input stability (precision)	Medium	Very good	Good
Low involuntary movement	Medium	Very good	Good
Pointing speed	Very fast	Medium	Fast
Smooth tracing and drawing	Very poor	Medium	Very good
Primary UI function	Scan	Point-click	Tap

Lowest resolution and stability – Measured gaze can differ from actual gaze by
several degrees or more. Just as touch UIs need larger widgets than mouse-driven UIs to accommodate fingertip-sized input, gaze-driven UIs need even more space for each widget. Consider that a typical touchscreen keyboard is the width of a smartphone, whereas a typical eye-controlled keyboard spans a full-size tablet screen.

Ensure gaze-enabled controls are sufficiently large and spaced generously.
Learn more about Scene Design

Gap between human and input resolution - Eyes can focus on tiny details, just as a mouse pointer can, but eye tracking cannot match the accuracy of a mouse. The conventional onscreen mouse pointer would be inappropriate to use with gaze input since it would nearly always be offset from where the user is looking and would present a visual distraction near the area of focus. In any case, people don’t need to be told where they are looking.

Only visualize the effect of gaze. Don't visualize the gaze position to the user.
Learn more about Visual Feedback

Input is secondary to scanning - Gaze tends to move everywhere due to its primary role of scanning visual information. User feedback and activation mechanisms should be compatible with scanning activity to avoid the Midas Touch problem, where users unintentionally activate objects by gazing upon them.

Use explicit activation to avoid accidental actions.
Learn more about Gaze with Explicit Activation

A deeper discussion of UI design with eye tracking can be found in
Interaction Design Fundamentals.

What’s the best way to deal with eye tracker inaccuracy?

You may be wondering how to avoid user frustration when eye tracking accuracy is low and there is no visible pointer to help the user self-correct. Let’s look at several techniques to deal with inaccurate eye gaze information.

Solution #1 – Larger, center-weighted targets

Larger targets are easier to focus, however gaze positions near the boundary are still at risk of escaping outside the target. Therefore, the most visually salient features should be located toward the center of the target to guide the user’s eyes away from the edge.

Advantages

Easy and intuitive to implement

Disadvantages

Impacts the UI aesthetic, making controls look chunkier
Consumes more screen real estate
Effectiveness reduced at larger distances in 3D UIs - targets shrink with distance

When to use

If the design is flexible, this is a simple and robust solution.

Solution #2 - Expanded hit region

The active zone of a gaze target is enlarged invisibly to capture gaze positions that are just outside the visual boundary. This technique is used in 2D and 3D interfaces to allow small or irregularly-shaped targets to be more easily activated. The expanded zone is transparent, so the apparent target size does not change.

Advantages

Invisible, respects the visual design
Easy to implement by adding active margins or enlarging the 3D collision mesh

Disadvantages

Not suitable for overlapping or tightly spaced targets – empty space around targets becomes interactive territory
Hard to ensure clear space around objects in 3D – transparent collision meshes in the foreground may block visible background targets
Getting the right margin/collider scale requires experimentation

When to use

Active margins are ideal for 2D grid-based UIs without overlapping or touching targets. It can work with 3D if the caveats are acceptable.

Solution #3 – Visible gaze direction

Although problematic for reasons mentioned above, visualizing gaze direction may make sense in certain circumstances, such as when the UI operation tolerates gaze offsets.

Advantages

Providing user feedback generally empowers usage strategies

Disadvantages

Distracting and unnatural
May frustrate users that experience larger gaze position offsets
May be more trouble than it’s worth

When to use

Rarely, if ever. If the interaction design benefits from a rough estimate of gaze, for example to highlight an area of the screen, showing a spotlight effect around the gaze position can provide feedback for UI operations while limiting distraction. The highlighted region should be large enough to encompass the user’s actual gaze.

Solution #4 - Explicit disambiguation

Like confirmation dialogs, the user is prompted to clarify or confirm when the system is uncertain of the user’s intent.

Advantages

Handles difficult cases where target clustering is unavoidable
Familiar interaction pattern that can be easy to learn
Potential signature moment if designed well

Disadvantages

Design and development complexity

When to use

Consider this technique when the layout of visual targets can’t be controlled, and UI dialog features are available. Clarification may use a non-gaze input mechanism such as speech or body gesture. Additionally, context sensitive behavior can identify and filter candidate targets to minimize dialog complexity.

Solution #5 – Machine learning algorithm

This technique uses an algorithm to receive gaze input and scene information to determine what object the user is looking at. The algorithm should ideally be tuned to handle a variety of scenarios involving objects of different sizes in different locations, possibly in motion.

Advantages

Invisible, respects the visual design
No UI constraints regarding minimum target sizes, clear zones or overlapping targets
No need to tweak design parameters for best results

Disadvantages

Adds computational load that may require additional resources
Algorithm is a black box and not necessarily portable

When to use

When the algorithm is available and computationally suited to the application, this solution is quick to implement and immediately improves the user experience. One implementation of this technique is Tobii’s G2OM (Gaze to Object Mapping) available for Unity applications.

Summary

User interaction driven by eye gaze is a natural evolution of humanizing computing experiences. Natural human eye movements and the variable signal quality of eye tracking devices create new challenges for effective UI design. Designers and developers can enhance user success, efficiency and comfort by implementing UI techniques specific to gaze input.

Written by

Lawrence Yau

Read time

12 min

Type

Learn article

Products

Solutions

Author

Lawrence Yau
Sales Solution Architect, TOBII
Lawrence is currently a Solution Architect in Tobii's XR, Screen-based, and Automotive Integration Sales team where he shares his excitement and know-how about the ways attention computing will fuse technology's capabilities with human intent. At Tobii, Lawrence is captivated by the numerous ways that eye tracking enables natural digital experiences, provides opportunities to improve ourselves and others, and shifts behavior to achieve more satisfying and sustainable lives. With these transformative goals, he is invested in the success of those who are exploring and adopting eye tracking technologies. He is delighted to share his knowledge and passion with the XR community. His restless curiosity for humanizing technology has taken his career through facilitating integration of eye tracking technologies, developing conversational AI agents, designing the user experience for data governance applications, and building e-learning delivery and development tools. Lawrence received his BE in Electrical Engineering at The Cooper Union for the Advancement of Science and Art, and his MHCI at the Human-Computer Interaction Institute of Carnegie Mellon University.

You might also enjoy

Understanding what the user is looking at

Understanding what the user is looking at

We'll explore the intricacies of eye-based input and unveil strategies to surmount its inherent limitations.

Eye Movement: Types and functions explained

Eye Movement: Types and functions explained

Saccades, fixations, and other types of eye movements can be captured with eye tracking technology. Read about various types of eye movements and their function.

What is eye tracking?

What is eye tracking?

Eye tracking is a foundational technology for a broad range of applications, including scientific research, behavioral analysis, and assistive technology.