
Accessible Locomotion and Interaction in WebXR
Is virtual reality a medium that can be comfortably enjoyed by everyone?
The VR industry has raced to deliver immersive experiences, often at the expense of comfort and accessibility to people who are not gamers. We tried to make VR, specifically the Immersive Web, a medium of the people, today, starting with the most disenfranchised.
Solving design challenges that make products accessible means ultimately making products better for everyone. We got somewhere. Here is the story.
Calling the problem by its name
AccessibleLocomotionWebXR is a rather long title for a hackathon project submission, right?
I did not choose this title as a tongue twister to trip up the sponsors and organizers when announcing the winners at the MIT hackathon awards ceremony, I wasn’t even expecting to win a single one of the total of three awards my team and I received. The title is not a hat tip to my German heritage either. (We Germans do love compound words.)
I named the project in a way that clearly and unmistakably shows a real-life solvable problem by combining the challenge (Accessible), the goal (Locomotion), and the platform/medium (WebXR).
In my previous article, Web VR Gets Real, I took a snapshot of the then current (2016) VR landscape and the challenges and opportunities for the immersive web.
I pointed out the need for inclusiveness and accessibility for the immersive web and had the opportunity to present my Approaches to Accessibility in WebVR to the W3C immersive web community group as a conduit for accessibility at the W3C WebVR Authoring Workshop in Brussels in 2017.
Since writing that article, I’ve built a project that allows visually impaired and blind users to navigate VR spaces with the help of spatial audio, and I have submitted a proposal for a training application commissioned by DOT with accessibility expert Thomas Logan from EqualEntry and immersive sound engineer Ian Petrarca. It’s a web-based VR application that allows blind users to explore and familiarize themselves with dangerous high-traffic Manhattan street crossings in the safety of their own homes prior to facing them in the real world. I am presenting the concept of the app at the A11yNYC Meetup.
A lowest common denominator for interaction in VR
Gaze-based input is an interaction mechanism in VR where the user has a reticle/cursor fixed between the eyes at an arm-length distance that moves alongside the head’s rotation. Aligning that reticle with an interactable object in virtual space, gazing at the object for a predefined time period (fuse button) or pressing a viewer specific lever/button will trigger an action.

Since Google has put the power of virtual reality experiences in the pocket of 5 billion smartphone users across the globe with the release of Google Cardboard in 2014, the lowest barrier to entry for this immersive medium was a smartphone and a cardboard-style viewer. Interaction based on gaze was normalized for low-cost VR.
The gaze-based input, however, is a somewhat odd and unnatural way to interface with the medium, constantly staring at a reticle/cursor that sits about one meter in front of your eyes, while all other elements of the virtual world are further in distance causes eye strain. Holding your hand next to the headset to control your VR content results in arm muscle fatigue, the so-called “Gorilla Arm Syndrome”*.
*(Gorilla Arm is a well-known syndrome that dates back to 1960s. It was first experienced by working on the MIT TX-2 with Sketchpad, the first graphical app)
When the first generation of Samsung’s Gear VR headset was released with a multifunctional touchpad attached to the side of the viewer, consumers were opting to buy Bluetooth connected gamepads as an alternative input method, a far more ergonomically comfortable* option to control their VR experiences then holding up your arm for an extended time period.
(* This is not judging the comfort of interaction mechanisms within VR; Simulating rotation with stick yaw control is known to cause discomfort)
Shortly afterward, Google introduced the Daydream VR headset (May 2016) which featured a single-handed Bluetooth controller — a three dimensions of freedom (3DoF) laser-pointer which detects orientation, movement, and acceleration. The headset and its controller defined a new standard for interaction in mobile VR. Samsung followed with their own 3DoF VR controllers for their next generation Gear VR headsets in April 2017.
Gaze-based input interaction is the default and the fallback for mobile VR when no other input device (gamepad or 3DoF controller) is available.
In 2015 the MozVR team released the first version of A-Frame: an open source framework for creating VR experiences for the web, running in the browser. A-Frame opened an exploratory and unbiased blank canvas for creatives to design virtual experiences. A-Frame’s community kept guidelines very loose and did not encourage/enforce “best practices” on how to build VR content and interfaces contrary to game engines like Unity3D, and Unreal Engine.

A-Frame adapted gaze-based input with a timed trigger (fuse button) as the lowest common denominator for interaction in mobile and low-end VR.
The fast-paced VR industry set their focus on embodiment and the capture of natural body/hand motion as methods to interact within virtual space, trying to unleash the true potential of the medium. Using gaze-based input for low-cost head-mounted displays like smartphone operated viewers (Google Cardboard) and their immersive experiences was never questioned or revisited. Gaze-based input assumed its position as the standard and fallback interaction for mobile and low-end VR. This raises the question: Is this standard inclusive and usable for everyone? The simple answer is: No. Back to the drawing board.
WebXR has to stay true to the World Wide Web
The open-source community of A-Frame and the immersive web inspired me to critically observe the walled garden app world of high-end VR (Oculus, HTC, and Valve) and see where some aspects of human-centered empathy-driven designs fell out of focus in the rush to achieve the excellence in embodied presence.
A-Frame and the WebXR device API are standard building blocks of the World Wide Web and therefore, in my opinion, should stay true to the mission of the web: a space of inclusive and accessible content.
Gaze-based input is a bad lowest common form of input in mobile VR. It demands a free fairly accurate range of motion and rotation of the head. The timed trigger (fuse button) is either too slow or too fast, depending on the users' age, health, physical location or environment (e.g. moving vehicle).
A clickable trigger on the side of a VR viewer expects the user to have the physical ability to reach and press a button.
In my earlier work, I was experimenting with alternative inputs in ways that we could maximize the device hardware of a smartphone. I built the A-Frame component shake2show that fired an event when the phone was tapped on the side, using the gyroscope sensor to read a quick shaking motion of the device. Since Apple’s release of iOS 12.2+, the most important device sensors for web-based VR are disabled for Safari by default and hidden in the settings as options to enable. This is breaking the originally seamless integration of WebVR but can be fixed by enabling Motion & Orientation Access once in the settings.
Disenfranchisement is a cultural decision
I had the pleasure of working with Jan Kalfus, a Berlin architect that brought to our hackathon team deep insights into modern architecture. He said, “the design of modern spaces is considering wheelchair accessibility, not because wider door frames, ramps and elevators are a cheaper solution, they are not; we are building inclusive spaces because it conveys a cultural and moral message of a society we want to live in.”
The same consideration applies to the design of virtual environments and spaces. We are at a crossroads where creating VR hardware and controllers that solely focus on the embodiment for the able-bodied sends a message of exclusion.
Creating a viable solution for the medium of virtual reality that considers a group of people who cannot participate in embodied interaction even in real reality is, in my opinion, the best benchmark for accessibility.
I am talking about people like Jared, an inspiring young man with cerebral palsy, rendering him quadriplegic, featured in the assistive technology awareness series, AT in Action.
Accessibility is not a minority issue
When talking about numbers I don’t want to quantify the need for non-embodied interaction in VR. The marketing terms “target groups” or “early adopters” are not relevant.
I want to share a few human facts that might surprise you.
The National Spinal Cord Injury Statistical Center (NSCISC) states that there are over 250,000 people with quadriplegia* (or tetraplegia) living in the United States.
(*Quadriplegia, based on the American Spinal Cord Injury Classification system, is defined as “paralysis of four limbs.”)
Neurological diseases/disorders, viral diseases and birth defects like Multiple Sclerosis (MS), Poliomyelitis, Spina Bifida, Syringomyelia, and Amyotrophic Lateral Sclerosis (ALS) make up the majority of causes for quadriplegia, but let’s not forget random instantaneously life-changing events like accidents with traumatic spinal cord injuries.
In fact, every 29.6 minutes somewhere in the United States a person sustains a spinal cord injury and survives. There are over seventeen-thousand people in America experiencing an accidental spinal cord injury each year. Whether it is a car accident, a sports injury, a fall, a slip in the bathroom or a gunshot wound, it can happen to anyone at any time.
Famous cases of quadriplegia include Physicist Stephen Hawking (had Amyotrophic lateral sclerosis ALS), “Superman” actor Christopher Reeve (horse riding accident), “Superfly” singer Curtis Mayfield (stage lighting fell on him during a concert), and Wrestler Darren Drozdov (sustained a neck injury freak accident while fighting in a WWF Smackdown), just to name a few.
An instant solution
The goal for our hackathon submission was to build a reusable piece of code, a plugin, a component for an existing VR framework on a sharable platform that can be implemented/applied for free right now.
In my presentations about accessibility, I am referencing nonprofits and charities like SpecialEffect and TheControllerProject that build custom game console controllers for people with a variety of disabilities. Assistive Technology is expensive and highly personalized. Customized game controllers are Bluetooth enabled. Most VR headsets and smart devices allow the connection of Bluetooth gamepads.
With the release of Mozilla’s Firefox Reality VR/AR browser for the HTC Vive Focus and Oculus Go, web-based VR (A-Frame) was a viable way to quickly deploy our project. Firefox Reality supports the Gamepad API, a standardized browser API that allows you to connect and map Bluetooth controllers and is already built into A-Frame.
In order to maximize impact in the community and raise awareness, we are going to be sharing the component on the A-Frame registry, and we will touch base with the framework stakeholders to find a path to integrate the concept of our component into the master branch.
As a first step, I wanted to disconnect the immersion from the interaction as lowest common interaction mechanism in VR.
You do not need to rotate your head like the 12-year-old Regan in The Exorcist to feel immersed in virtual space, in fact I might argue that wearing a VR headset with minimal to zero neck motion is just as immersive as a wide angle rotation, which might increase your chance of dizziness depending on frame rate and latency. A wide field of view extending into your peripheral vision and blurry free lens distortion are enough to give you a sense of presence.
The next step was to look at interaction in its most basic way.
Embodied VR interaction suggests that you pick up things with your hands and draw things with your fingers. In our project, we had to analyze the mechanics of assistive technology (AT) for individuals with different abilities, such as those affected with quadriplegia.
We were looking for the most extreme case, where head movement and voice commands were not an option for control and decided to focus on Sip-and-Puff, a mechanical input that reads the airflow from a tube the user breaths into.
If the only thing you can control is the airflow to your lungs, it is our goal to enable you to explore and interact in virtual reality.
Sip-and-Puff technology as a baseline input

In this quite complex technology of Sip-and-Puff, the person has multiple options to switch and fire events. Sip stands for inhaling and Puff for exhaling air into the tube. There are multiple frequencies from a short and firm puff to a smooth and elongated blow, same for the suction of Sip. Sip-and-Puff controls were originally introduced in the 1960s to control wheelchairs.
My team and I had a long and important discussion about the purpose and the morality of defining VR input methods. Should we mirror the locomotion in VR based on the common input of a wheelchair Sip-and-Puff control unit? Are we forcing people to retrain input mechanisms they have been used to and why?
In the end, we decided that, while we were trying to apply the mechanics that our users are familiar with, we also had to introduce a new, fast & easy to learn, repeatable, expandable, and sustainable input language. Sip-and-Puff was not supposed to be literally translated into motion, our input had to be applicable for any other AT as well, and it had to be as simple as clicking one binary toggle button (like a morse code trigger).
So we went a step further and simplified our binary input control component to just look at the Puff of a Sip-and-Puff control.
Our binary control component had to enable exploration and interaction in virtual worlds. So we distinguished two different interaction paradigms and split them into separate input modes:
- Locomotion mode
- Interaction mode
Locomotion mode
Locomotion mode consists of an augmented navigation interface displayed in front of the users' view and fixed to the rotation of the head, like a heads-up display (HUD). It shows buttons that trigger forward movement, backward movement, left/right rotation.
One important consideration was how to prevent motion sickness when forcing camera rotation in VR. Like I mentioned earlier gamepad controls were critiqued as presence breaking and nauseating.
Here are some steps we took to avoid VR sickness for our new VR users:
- Linear motion instead of an eased acceleration: Smooth motion and easing might look good on desktop screens but in VR it is imitating how we feel when we are intoxicated.
- Movement in dashes rather than continuous motion: In order to give our users more precision and less feeling like a shopping cart rolling downhill we decided to translate forward and backward motion in small dashes, it takes more time to navigate but it keeps the viewpoint locked in place and lets your inner ear reset.
- Turn in fixed angled steps without animated rotating motion: Forced camera rotation is the fastest way to make you sick and in every VR playbook it is mentioned as one of the top five things to avoid, yet in most social VR environments controller triggered rotation is an option (e.g. Altspace), the trick is not to animate the rotation but just to jump into position and by keeping the rotational angle relatively small (22.5 degrees) people don’t lose their orientation.
Interaction mode
Designing the Interaction mode was not as straightforward as the Locomotion mode. Since gaze-based input is the lowest common mechanism for interaction in VR we had to imitate the gaze towards something while having it feel like an independent and not intrusive controller.
If you watched Jared's video linked above you can see him use Sip-and-Puff to navigate the software and use programs like photoshop. He navigates deeply nested user interface panels and selects areas on the screen by switching from a rotating red axis line to a moving cursor up and down on the axis. The red axis line is always angled towards a corner of the screen providing a wide reach of any element on the screen within just a 90-degree rotation.
In VR we don’t have a fixed screen size and the only boundary is the field of view of the user. Our solution was rotating of a ring around the users head on the z-axis (in gaze direction) and then switching to a back and forth swinging motion of the cursor from -50 to +50 degrees, totaling an average of 100 degrees FOV which keeps the cursor comfortably in sight at any time.
Now that we defined the two different input modes and the mechanics of interaction we started defining input functions for the interfaces.
Defining binary input methods
Our binary input control consists of two states (ON / OFF) and the timespan t in between the two states.
We need a triggering method (we called it Trigger-A) that executes like the Enter/Return key or the click of a button.
Trigger-A has to be consistent in both input modes and needs to be the easiest and fastest mechanical method.
We need a switching method (we called it Trigger-B) like the tab key or a swiping gesture or like a long-click event.
Trigger-B has to be equally effortless method but allows for a less reactionary execution then Trigger-A
And we need a change mode method (we called it Trigger-C) to toggle between the input modes like a click & hold.
Trigger-C is a method that is the least frequently used but still needs to be comfortable to execute and has to be distinctively different from the other two.
We mapped the three methods as timed binary input sequences with Sip-and-Puff mechanics in mind to navigate and interact within VR:
- Trigger-A is the Hard Puff — timed sequence fast and short:
ON — t[0ms — 200ms] — OFF
- Trigger-B is the Soft Puff — timed to be longer than Trigger-A but without using up more then ~1 second of air volume — timed sequence long and slow:
ON — t[500ms — 1500ms] — OFF
- Trigger-C is a Continuous Soft Puff designed that dispatches automatically after a couple of seconds, we didn’t want our users to run out of air so we kept it in the ~3 seconds range but at the same time we needed to avoid the method to be triggered on accident when trying to execute the Soft Puff, so it had to be significantly longer than Trigger-B — timed sequence slow continuously:
ON — t[2500ms — 3500ms] — OFF
Mechanics recap
In order to switch between the two input modes (Locomotion & Interaction) you use Trigger-C
In Locomotion Mode, you switch selection of the navigation interface buttons with Trigger-B and press the selected interface button with Trigger-A to execute motion functionality.
In Interaction Mode, you switch with Trigger-B between three states of cursor motion — static, rotation of the axes in gaze direction, and swinging of the cursor along the axes from -50 to +50 degrees. Trigger-A operates independently from cursor motion and is always available to dispatch a click event on anything intersecting with the cursor. This mechanic allows the user to align the axis in a direction that helps the user execute multiple clicks alongside the swinging motion of the cursor. Imagine moving a pawn on a board game from one field to the next, or imagine typing into a virtual linear keypad.
Our binary input control works with Sip-and-Puff, it works with one single button, and any Bluetooth enabled assistive technology or customized game controller. All the user needs to do is to map one controller output to the Bluetooth port of the headset or smart device.
Building the Demo
Now that we had our binary component ready we had to showcase how it worked. My initial idea was to create a VR walkthrough of a newly constructed home, but immersive 3D artist Pilar Aranda raised the important point that issues with wheelchair accessibility are more present in older buildings which are not built to modern building code guidelines and that we should focus on a real, working-class apartment. With Jans 1940’s Berlin apartment as the basis for our virtual environment, we added Wayfair’s furniture from their new 3D Model API to decorate the rooms. We added a wheelchair underneath the user camera and mapped the binary input to the touchpad of the 3dof controller while removing any other buttons functionality and tracked controller output. The only input signal that would be linked in our scene was touching the trackpad [ON] and not touching the trackpad [OFF] of the 3dof controller.
We built an experience that can be navigated and explored by quadriplegic users, they can inspect the furniture that may not be the size or height they want in their living space, furniture that might restrict their wheelchair navigation. The demo also serves as an empathetic insight of how people in wheelchairs experience an environment and how it feels to live without embodied agency and limited interaction in real life.
We showcased our prototype at the Post Hackathon Expo and we were happy to see that getting used to the mechanics of the binary input controls due to its simplicity took in general just a few minutes for users of any age group.
433 Hackers — 110 Team submissions: AccessibleLocomotionWebXR is a 3x winner at the world’s largest, most diverse XR Hackathon. January 17–21st, 2019 @ the MIT Media Lab, Boston
- Wayfair Way-more (timestamp 1:46:00)
- Best Application For Accessibility (timestamp 2:10:00)
- Best Use of an HTC Vive Focus (timestamp 2:19:30)
Biggest thanks go out to my rockstar team: Jan Kalfus, Pilar Aranda, and Selena de Leon and last minute help to the rescue from Anthony Scavarelli and Winston Chen.
Next Steps
We partnered with Thomas Logan and the EqualEntry network and are reaching out to quadriplegic testers, component usability testing, and fine-tuning ahead, very exciting!
If you enjoyed this project and the thoughts that went into creating it, have any feedback, or if you are a developer and would like to contribute to the open-source A-Frame component, I’d love to hear from you. Say hello at contact@rolanddubois.com or connect on LinkedIn.