David Moloney, Senior Vice President and Chief Technology Officer, Movidius

Sales of wearable devices are expected to exceed US$8 billion by 2018, with over 130 million units globally, according to Markets and Markets.

However, while wearables are being touted as the next “must have” in consumer electronics, signs of “wearer fatigue” are becoming apparent. Endeavor Partners report that one-third of American consumers who have a wearable product stopped using it within six months, rising to over 50 percent in the case of fitness bands.

Setting aside the hype, what is the reality for wearables as a category? Are consumers abandoning wearable devices in droves, has the hype-cycle already peaked, or will the purchase and daily use of wearables continue to rise?

Bridging the Gap between Wearer Excitement and Wearer Fatigue

Connected devices in the “Internet of Things” can sense temperature, acceleration and a plethora of parameters from our environments. Currently one in six Americans owns at least one of a multitude of wearables and hobbyists can even build their own devices based on the MetaWear Kickstarter. These devices tend to have an accelerometer, Bluetooth LE and a vibration device but little else, limiting the utility to applications like fitness trackers, heart rate monitors and keyfinders.

Is this “wearer fatigue” due to a focus on applications that don’t provide specific or enduring benefits to consumers?

Wearables must evolve to truly augment user-experiences and provide enduring value to consumers and become “sticky”. There is a huge opportunity to add value by having our everyday objects recognize us or other people understand our environment and provide us with useful services as and when we need them. Four essential challenges must be overcome for this evolution to occur.

First, security and social acceptability of “in your face” devices like Google Glass is a huge issue. Despite a legion of early adopters, Google Glass has been widely criticized for its social intrusiveness before even hitting the mainstream. In contrast, other wearable devices like GoPro cameras appear to be more socially acceptable. Such devices have an always-on view of the world with built-in connectivity to the user’s phone, tablet or even the cloud and yet are still considered acceptable because they are “dumb” capture devices.

Second, wearable devices must offer a superior user-experience. Unobtrusive wearables present design challenges where they often lack a screen or anything but an on-off switch. Devices like Moto-X show how natural language interfaces can allay privacy and social acceptability concerns as a spoken command is required in order to capture an image.

Third, wearables must be safe to use. Generally humans are poor multi-taskers and according to research, multitasking can reduce productivity by up to 40 percent. Anybody who has tried to text while walking or use Google Glass while performing other tasks will attest to its effects. From this point of view a wearable device that enhances our ability to multitask without overwhelming our already heavily taxed visual cortex with yet more data is highly desirable. Filtering visual data in a wearable device before it is presented to us has the added benefit of reducing overall energy as less data has to be transmitted wirelessly between personal devices.

Finally, battery life is paramount. Without sufficient battery life, what the consumer is offered is a demo and not a sustainable user-experience in the digital world. Here the costs of performing processing in wearable cameras, phones or tablets and the cloud will be compared to show how the ultimate in power-efficiency can be achieved.

Given the above challenges, there is a real opportunity to zero in on vision technologies to enrich user experiences, enable safety and privacy and tackle the battery life problem.

Bringing Visual Awareness to Wearables

How can we design a wearable assistant that can recognize situations, objects, people, etc. and inform us about it using metadata rather than a stream of images?

Current smartphone application processors (APs) are limited in terms of resolution, frame-rate, power and programmability. Implementing relatively simple computer vision pipelines on conventional APs is highly complex, requiring the coordination of multiple ARM processors, DSPs and GPUs, all of which must share access to data in memory. The reality is that developing apps for such platforms is a huge challenge, even with vendor supplied libraries and APIs.

To meet this challenge a new category is emerging, in the form of vision processors. Also known as Vision Processor Units (VPUs), these provide the computational means to offer “human” vision at ultra-low cost and power. This enables new applications such as image search, merged video and graphics, robotics and a range of emerging apps that require exceptional computational capabilities and memory bandwidths.

The result is a life enhancing and potentially life-saving notification to our mobile devices and even of potential threats to our well-being. Scenarios enabled by such a wearable device could include recognizing a person that the user met previously, alerting the user to an oncoming vehicle to watch out for, or alerting the user to a red pedestrian light so that he or she doesn’t step off the curb.

These scenarios all require a view of the world that cannot be provided by a smartphone or tablet, but rather a wearable button camera with embedded and local processing capability that only alerts the user when a user-definable trigger-event or sequence occurs. The benefits of this approach are not only timely situational awareness, but also a minimization of battery power as the wearable camera ships 1000x less metadata around than video frames.

The “pixels stay in the camera” paradigm guarantees privacy, is energy efficient, conserves precious data bandwidth, enables lowlatency services, makes for a more robust and scalable system and ultimately makes most sense.

Why Should the Pixels Stay in the Device?

Google Glass, the thought leader in terms of wearable devices, is powered by a 570mAh battery. A 30 day wearable use-case based on the same battery requires some analysis as total current consumption would need to average less than 0.8mAh in total for the desired battery life to be achieved. Here we’re including the following in the total power consumption: the camera, the display, the application processor running Android, audio input-output, accelerometer, touchpad, and wireless connection (Bluetooth LE alone requires 10mAh @ 100% duty-cycle).

Locality of processing is a key concern so let’s consider executing computer vision algorithms in a wearable device and transmission of metadata, versus the costs of transmitting the frame wirelessly to a phone application processor.

Figure 1. Distribution of Vision Processing Load Determines Power Efficiency , but what would be the comparative energy cost of computing video frames and sending metadata as opposed to transmitting the raw video frame to a linked smartphone or the cloud?

Figure 1. Distribution of Vision Processing Load Determines Power Efficiency , but what would be the comparative energy cost of computing video frames and sending metadata as opposed to transmitting the raw video frame to a linked smartphone or the cloud?

Current computational video processing in 28nm yields 100 to 1000 megapixels per second per watt, therefore as low as 2 mW per QVGA video frame at 30fps, while the various wireless transmission standards {802.11n, 802.15.4, Bluetooth EDR, and Bluetooth LE} range from six to over 70 mW per QVGA video frame (without considering protocol overhead), which is 3x to 35x less efficient, not even counting the power used in waking up the application processor (typically 10x less computationally efficient than computational video processors ) in the phone.

As shown, to perform vision processing locally in the wearable device is preferable to transmitting video data to the smartphone or onwards to the cloud where this makes sense for the application.

Enabling Developers

In typical image/video pipelines, application developers require access to the entire video stream at VGA resolution and 30fps. In practice for many computer vision applications, a slower frame rate of approximately 5fps may suffice, and may only need to look at a small section of the image at a fairly low resolution. A study by Microsoft Research of modern image sensors has revealed two energy-proportional mechanisms present in current image sensors but currently unused in mobile systems.

First, optimal clock frequency reduces power by up to 30 percent in video applications and second, low-power standby between frames saves an additional 40 percent in power. Experimental evaluation of the use of these latent mechanisms in current image sensors suggests that major savings can be achieved, for instance a 36 percent reduction in active power and 95 percent reduction in standby power for image registration (used in image mosaicking and depth estimation).

Exposing these power-saving modes in an API would provide application developers the energy-proportionality they need (fewer pixels/second = less energy used), while hiding unnecessary details and still providing a high level of quality and accuracy.

Unleashing the Next Wave of Wearable Experiences

Figure 2 summarizes a number of use cases that are enabled by the increased efficiency of processing in a power-constrained device.

Figure 2. Visual Awareness Use Cases

Figure 2. Visual Awareness Use Cases

It is clear that computer vision technologies can greatly enhance user experiences by enabling true visual awareness in wearable devices. This approach is particularly relevant to the wearables which must “tell us something we don’t already know” in order to create enduring value and overcome “wearer fatigue.” The category requires an always-on computer vision processor with a radically innovative architecture, which provides intelligence in real time through visual sensing as close to the sensor as possible.

Generally, processing locally enables computer-vision applications to leverage our mobile devices and the cloud while at the same time conserving energy and bandwidth across the spectrum, and has the added benefit of minimizing latency which is essential for safety critical information as well as gaming and interactive services generally. In fact, the proposed approach holds as much for camera subsystems within phones and tablets as much as it does within wearables.