Missing Time: The Metaverse and Human-Computer Interaction | by Stanislav Stankovic | November, 2022

Hugo Gernsback wearing a head mounted display.

I’m old enough to remember when Snow Crash by Neil Stephenson originally hit bookstore shelves. It was the early 90s and I was in my teens. Our computers still give out a sequence of high pitch noises whenever they try to connect to a network. But cyberpunk was a brand new thing, our vision of a hyper connected future in which we were supposed to spend a good chunk of our lives online.

Fast forward thirty years into the future. It is the year 2022 and the term Metaverse, coined by Mr. Stephenson, is now a product name. This cyberspace vision is co-opted by one of the largest global corporations. I think we are living in the future we have been promised.

This future is once again tied to some concepts that technology developers have been throwing around for decades, namely VR goggles and immersive 3D environments, bundled with new controller devices. The allure of this vision is that it will redefine the way we interact with technology by making it more intuitive and, in some way, making us more productive in the process.

By definition any interactive system, which includes all VR environments, an input device, environment and display devices, form a complete technological loop that allows the user to interact with said environment. The keyboard of my laptop, the text editor I am using to type this text, and the laptop screen are one such mechanism. They’re pretty enough for the purpose of making this document, and I’m perfectly happy using them. No wonder, they are the result of several decades of incremental improvements. Then again, they were not initially put into practice in their present form. Like anything, there is room for innovation and even disruption. VR, AR, XR, etc. promise exactly this, i.e. disruption in the way we interact with our technology.

As it goes the final decision on any such promise shall be passed by the end users. The kind of ordinary people who are not enamored with technology itself, would rather use it as a means to an end. The Quality of User Experience (UX) will make or break any such offer. On the other hand, widespread adoption of any new Human Computer Interaction (HCI) paradigm will have significant consequences for the way we design UX.

In this lesson, I’m not going to talk about intentional simulation sickness. It’s a phenomenon that has drawn a lot of attention from both critics and proponents of the technology. It is also a phenomenon that a lot of development is focused on.

Instead, I’m going to talk about three other, very important human factors related to the UX of these new exotic devices. These factors manifest in some very important conceptual problems in designing new methods of human-computer interaction. I believe that anyone working in this field should be aware of them, and I will try to explain why in the following text.

The first of these three assumptions is what is known in HCI as Time to Disengage. This enigmatic name alludes to something very simple and the very opposite at the same time. We build systems to interact with them. Ideally, UX is about making interactions more comfortable, efficient and intuitive. However, we continue to live in the physical world. While we are immersed in the cyberspace of our technology, we are still immersed in the real world. Seamless switching between the two is one of the most important but most neglected tasks.

While I am writing this lesson, I am concentrating on my work. I am in flux. I try my best to ignore my surroundings. However, there may be things in the real world that I would need to pay attention to. It could be a very simple thing. At some point my colleagues may ask me about something. My son may need help opening the jar of cookies. Or the electric cattle could start boiling. The cat can jump on the desk. All kinds of things can happen. There could be a fire in the house. The time my attention transfers from the laptop screen to the real world around me is measured in milliseconds. All I need is to move my eyes in the direction of the potential distraction. Returning to the virtual world on a laptop screen requires the same amount of effort.

Compare this to wearing VR goggles. This device effectively blinds me to the real world. The time required to emerge from the real virtual to the real world can again be very brief. However, it requires a great deal of effort. It required that I use my hands to pull the tool down with my eyes. It also adds an extra second if my hands are tied to a fancy 3D controller. Even if I am perfectly used to doing these tasks, no matter how fast I am, it will always be an order of magnitude slower than looking away from the screen. The same amount or even more effort is required to get back into the virtual world.

Even minor interruptions in using VR Gear can lead to major annoyances. This, in turn, may limit the number of situations in which these tools can be practically used, in turn limiting their adoption by users.

Drawing with a ballpoint on a plain piece of printer paper looks different than painting with a watercolor paintbrush on Aquarell paper. Coloring with Crayola feels different than using a marker. When you try to cut a piece of raw meat, it feels dramatically different than when cutting bread or fixing a piece of cheese. This is something we take lightly, but it is really extremely important.

The forces exerted by our muscles on the device are reflected back to the nerve ends in our skin. This complex interaction of forces is essential for the performance of our various functions. It is a self-feeding sensory-motor loop in which our brain adjusts our speed to properly perform a delicate task. What we are feeling is actually passive haptic feedback.

Diagram of the sensory-motor loop in a VR system
Sensory-motor loops in VR.

There is nothing specially built into the handle of the knife to transmit this force response to our fingers. What we are feeling is the result of fundamental laws of physics. It is simply the passive resistance of a solid object to the application of an external force due to its inertia. Thus, it is a passive form of the reaction. Each object gives unique and recognizable haptic feedback with its form, structure and context of application.

Now consider the gesture-based interface. They were all the rage on computer consoles a few decades ago, during the time when the Nintendo Wii and Microsoft Kinect ruled the world. They are emerging again with all the new tools and software like google tilt brush etc.

Google Titlt Brush Promo Content
Google Tilt Brush.

These devices and applications attempt to simulate the behavior of various real-time devices while operating in a virtual world devoid of the laws of physics that our bodies can detect. In the virtual world, you are handling a non-material gun or a paintbrush. The space between your fingers is empty. There’s no physical desk or canvas to lean on.

The virtual world lacks appropriate passive haptic feedback. Sometimes it doesn’t really matter. Sometimes, it is actually for the benefit of the user. No one misses the real recoil during virtual gun shooting. The fun in most shooter games isn’t in a faithful representation of the weapons’ properties. However, as we move more and more towards the so-called professional applications of VR and AR, these things will start to matter more and more. Painting something in a virtual void feels quite different then carving something out of physical clay or wood.

These things matter. Working without a passive haptic response is like trying to operate under local anesthesia. Digital image processing and painting applications have been developing for decades in ways in which they mimic important properties of drawing with a variety of physical devices. A big part of this is about mimicking appropriate passive haptic feedback. Photoshop has a sophisticated set of settings that allow the user to adjust the speed with which virtual ink flows and spreads around the virtual tip of a virtual tool.

The human mind, on the other hand, is a remarkably malleable entity. Humans are able to adjust to the qualities of new devices that come before them.

Leap motion gesture interface.

I make mistakes while typing. I’m a little clumsy and I only use three fingers on each hand to type. However, I actually need to press a physical button when I type, even if I make a mistake. There is no ambiguity in pressing a button. Once a button is pressed, either by intention or by mistake, the system can perform any task assigned by the software to that button.

If we go back to the gesture interface from our previous example, we’ll see that things aren’t always that simple. Sure you can have a sophisticated system to recognize the user’s hand gestures, but how do I know if the user is waving his hand to flip the pages of a virtual book or he is waving for some other unrelated reason? Used to be. Issuing commands by gesture may seem intuitive, but the simpler these gestures are, the more difficult it is for the system to discern whether the gesture was made intentionally or accidentally. The more complex the gestures, the more difficult it is for the user to execute them properly.

This problem is not insurmountable, but it is. Some systems circumvent this by having users hold onto a small device equipped with a button. Pressing a button again indicates intent. However, this approach also has its limitations. First of all, this is not a pure gesture approach. It still requires interaction between the user and the dedicated device. Second, it does not help to differentiate between two different yet similar inputs.

Even installed devices have this problem. Consider the swipe command on a touch screen. Swiping up and down scrolls the contents of the virtual page. Scrolling left or right flips the virtual page or tab on the browser. Depending on how you hold your phone, your motion isn’t going to be nearly entirely horizontal or vertical. If you’re like me, your speed is somewhat skewed. Every time the system misinterprets them and works against my intended purpose, causing me no end.

None of these three things represent an insurmountable obstacle to the development of VR. Technology developers can reduce their impact, or work around these problems. In some cases, it is even possible to work with them for the benefit of potential users.

However, one should be aware of these and many other notions when working in this field. Failure to recognize their importance can lead to serious design failures. These three concepts are largely independent, although they may exaggerate the negative effects of each other. For example, the combination of a lack of passive haptic feedback and false input has been a major impediment to the development of gesture-based interfaces in history. The Microsoft Kinect remains a gimmick, while the gamepad, keyboard, and mouse remain mainstays of gameplay today.

Leave a Comment