Picture this: four people cradled inside a luxuriously appointed interior cabin, being transported safely to their destination in a self-driving vehicle. The autonomous car takes over for the most part, if not entirely, the function of driving, leaving passengers more time to spend as they see fit—with some already starting work with their mobile devices via in-car connectivity during slow-moving traffic. Throughout the city, the ride-sharing public is doing the same: hailing self-driving cars to get from place to place.

Fantasy or fiction? No: this is the vision of carmakers for the self-driving vehicle of the future, in which the automobile is not simply a means of transport but also represents a cloistered space for retreat or private musing. Timetable for realizing this vision: sometime in the 2030s—ostensibly the point when fully self-driving cars will become available on the market.

For that to happen between now and then, carmakers—like Toyota, BMW, Volvo, Mercedes-Benz, Nissan and Ford, among others—are stepping up efforts and renewing focus on artificial intelligence (AI), the computational acumen and capability demonstrated by machines for intelligent behavior, in this case as it relates to cars.



The world of deep learning

A subset in the field of AI is deep learning, a branch of machine learning that makes use of algorithms to structure high-level abstractions in data by processing multiple layers of information, as it tries to emulate the workings of a human brain. Progress in deep learning has been possible thanks to advanced algorithms alongside the development of new and much faster hardware systems based on multiple graphics processing unit (GPU) cores instead of traditional central processing units (CPU). These new architectures allowed faster learning phases as well as more accurate results.

The concept of deep learning is not new, dating to as early as 1943 when a mathematics- and algorithm-based computational model for neural networks called threshold logic was created. But a breakthrough in deep learning occurred in 2015 when Microsoft software succeeded in identifying with startling specificity the contents of 100,000 test images in ImageNet, a vast image database of people and objects including plants, animals and structures that scientists throughout the world employ to teach image recognition to their software. The Microsoft program scored a 4.94% error rate, lower than the 5.10% error rate of humans.

Image recognition is particularly important in the abstract ontological world of deep learning, which hopes to mimic the extremely complex reasoning process of the human brain. In the case of deep learning, the brain’s massive neuron system is replaced by the computer’s own convolutional neural network comprising hundreds of millions of connection points.

And in place of the brain’s symbolic logic system, machine-learning technique is deployed that helps the computer brain become proficient in speech recognition, computer vision and language processing.

Instilling deep learning is a highly complex, multi-layered process comprising two key phases.

In the first phase or the “training” stage, inputs such as sounds and images are fed to the computer’s neural network brain so that it “learns” to recognize objects and see patterns, and then discovers how to identify objects in a variety of situations.

In the second stage, each input is broken down into layers, as the neural network goes further into deep learning and absorbs various levels of abstraction. Each layer then categorizes a particular type of information, such as an edge. The layer further refines and passes on that information to the next layer, which might collect a series of edges to form angles. The neural network then learns how angles assemble to form a pattern, and so on.

In the specific application of image recognition, the numerical optimization of an image’s various details—shapes, colors, edges and patterns—generates a so-called “optimal stimulus” representing the canonical “face” of the image. The optimal stimulus is then stored in the application hardware of the convolution neural network and used as comparison term in the inference phase, to allow real-time face recognition.

The type of detection performed by deep learning can also run on more nuanced levels. In context evaluation, for instance, deep learning will recognize that a human face wearing eyeglasses is the same as that without, instead of mistaking the human face for that of another object sporting facial characteristics, such as an animal.

Overall, the deep learning model represents a dramatic departure for AI in its early reliance on conventional or traditional machine learning—a cumbersome process that required experts to manually feed data into learning software. The experts must also manually choose the features that the software should heed, as well as correctly tag and label data—such as identifying images containing cats, in order for the software to apprehend the concept of the creature known as “cat.”

In comparison deep learning allows the detection and recognition of multiple objects, improving perception while reducing power consumption. Traditional algorithms deploying the model known as histogram of oriented gradients (HOGS), used for object detection and tracking, cannot perform actions on multiple objects, and the power consumption needed to deliver the required performance would be prohibitive.

Deep learning in automotive

At its most basic level, autonomous cars trained with deep learning will be able to detect other cars that are driving, objects in their way, or people on the street. The self-driving car approaching an intersection will yield to pedestrians, but will also know when to engage in more “aggressive” behavior—such as in a four-way stop when it signals intent to drive through, especially if drivers in the other cars are unresponsive.

Deep learning will likewise be able to differentiate—making distinctions similar to what a normal human brain is capable of doing. It will discern various types of vehicles—a police car from an ambulance, a taxi from a van or a truck, an ambulance from a private car, a parked vehicle from a car pulling out and merging into traffic. It will also tell signals apart—those from a traffic stop as opposed to winking brake lights. It could distinguish the types of people populating the street—a cyclist, someone distracted on a smartphone, or a harried type making a last-minute dash to cross the street.

Deep learning can also update and train continuously. Broadly deployed telematics systems in autonomous vehicles, for instance, will enable continuous gathering of real patterns and data on traffic and terrain for training, allowing over-the-air system updates or upgrades.

More important, deep learning is expected to possess deterministic latency. Latency in autonomous driving can be defined as the delay between the time inputs are captured by the car’s sensors, such as cameras and radar; and the time a response is generated from the car’s actuators, such as the brakes. In the real-world implementation of deep learning, this delay is expected to be determinate and fixed so that regularity can be expected, which in turn would reduce the possibility of indeterminate or erratic response, since any deviation cannot be tolerated in the name of safety. In comparison, the standard algorithms used to filter noisy inputs from sensors are indeterminate and inferior, requiring more calculation compared to deep learning.

Overall, the deep learning model in automotive represents a potentially big advantage for system updates and development costs, implying reduced redesign cycles in comparison to traditional complex hardware and software systems. While the “brains” for intelligence remain the same between deep learning and traditional machine methods, the time spent in teaching with deep learning produces far more meaningful results.

IHS Markit projects that within ADAS and Infotainment applications, artificial intelligence systems will be implemented in approximately 85 million new vehicles worldwide by 2022, up from less than 10 million in 2015. Infotainment Human Machine applications, like speech recognition, will account for the vast majority of the forecast during the period.

Current implementations and applications

Deep learning is now available today in some vehicles—and it is not confined solely to advanced driver assistance systems (ADAS) and autonomous vehicles. In the infotainment human machine interface (HMI), most speech recognition technologies already rely on neural network algorithms running in the cloud. In particular, the 2015 BMW 7 Series is the first car to use a hybrid approach, offering on top of cloud-based support embedded hardware able to perform voice recognition in the absence of wireless connectivity.

Overall, autonomous driving calls for a car to possess situational and contextual awareness of its environment. And to discern the safest path for the vehicle, the self-driving mechanism requires a powerful visual computing system that can integrate in real time the data from cameras and other sensors, along with information from navigation sources.

In particular, deep learning in automotive has applications in two major areas: infotainment and navigation on the one hand; and ADAS on the other.

In the area of infotainment, the impact from deep learning extends further into the realms of speech recognition as well as hand and gesture recognition.

  • In speech recognition, an in-car system trained with deep learning is able to better understand spoken commands, and car passengers can issue command-directed speech that follows the syntax of normal spoken conversation, as opposed to stilted, machine-like language patterns.
  • In gesture recognition, electronic devices through the use of mathematical algorithms are capable of interpreting common hand gestures, so that gesture-based controls become applicable to interactive displays.

In the area of ADAS, deep learning applies to realms like camera-based machine vision systems and radar-based detection units; driver drowsiness; and sensor-fusion electronic control units (ECU).

  • Through sophisticated radar systems and multiple cameras, cars learn how to self-park, detect pedestrians or other objects on a road, and sense dangerous situations in order to avoid collisions.
  • In driver drowsiness, deep learning detects cognitive distraction caused not only by irregular facial or eye movements but also by erratic driver behavior and biological patterns. Here, algorithms analyze time-series data about the car—e.g., steering—as well as data on the driver—e.g., heart rate—to detect driving behavior deviating from appropriate or expected patterns. The vehicle then issues a warning in the form of an alert, or the car’s AI directly intervenes and takes control to, say, steer the vehicle back to its lane.
  • In sensor fusion, modules in ADAS could be deployed in surround-view park assist and safety-critical functions, including collision warning and adaptive cruise control. In collision warning, systems alert the driver to a potential crash, slowing the vehicle down and bringing the car to a stop before an accident can occur. But with autonomous functions like adaptive cruise control, ADAS systems automatically adjust speed to maintain a vehicle’s distance between cars in the same lane. Such architectures typically include a front-view camera, other surround-view cameras, radars, modules for LIDAR (a portmanteau of “light” and “radar”), as well as the sensor fusion module. The sensor fusion module typically fuses data coming from these sensors.Deploying sensor fusion modules for autonomous functions allows data processing to be centralized, which will let car makers differentiate their solutions. Shifting processors to one or fewer ECUs also means reduced power consumption and lower solution costs.

Deployment strategies for deep learning in infotainment can be through cloud-based solutions that rely on smartphones or a telematics link; or via an embedded solution within the vehicle, offered together with cloud functionality. In comparison, deployment strategies for deep learning require that its hardware implementation be embedded in the car for real-time safety applications. Moreover, deep learning in ADAS can serve through either a redundant channel that supervises the results from traditional algorithms; or as a stand-alone control unit for active systems.

Important players in the field

Among the important players in automotive deep learning are silicon suppliers, original equipment manufacturers (OEMs) and Tier 1 suppliers.

In the first group of actors are Israeli-based Mobileye and the chip companies from California’s Silicon Valley—Nvidia, Intel, Xilinx and Synopsys; CEVA of the United Kingdom; and NXP Semiconductors of the Netherlands. In the second group of players are OEMs like Sweden’s Volvo; US giant Ford; and Daimler, BMW and Audi, all from Germany; or outsiders like Tesla, located in California. In the third group are Tier 1 providers like Japan’s Panasonic and Denso; Delphi from Michigan; and Germany’s Bosch.


Among automotive AI programs, Nvidia’s Drive PX is a development platform that allows OEMs to implement deep learning functionality into vehicle systems by running neural networks. Drive PX is deployed today by the likes of Volvo in its self-driving vehicles used in the Goteborg project. Nvidia also has a complete platform, called the Drive CX, for the car’s digital cockpit to enable advanced 3-D navigation and infotainment, natural speech processing and surround vision.

A major player is Mobileye, the leader in system-on-chip (SoC) technology for front-view camera systems, which collaborated with Tesla Motors in supplying the EyeQ3 processor that supports image-analysis intelligence for Tesla’s autonomous driving technology. More functionality is to come in Mobileye’s next-generation chipsets, such as the EyeQ4, in production from 2018; and particularly the EyeQ5, expected to be deployed around 2020 with about 12 tera—one trillion—operations per second in performance at 5 watts of power.

Many issues relating to self-driving vehicles remain of concern, both to the automotive industry and the car-buying public. Safety, safety certification and standardization criteria are obvious matters of interest. For example, how the car is able to make choices on behalf of passengers as the vehicle ferrets them to their destination are choices that may require split-second decisions or carry life-saving or life-threatening consequences. And how will the industry certify if performance is up to standards? Also of major concern are liability issues, or assigning responsibility in case of autonomous-driving crashes or accidents: will the carmaker be at fault, or will the particular supplier of a part that fails in the car bear the burden?

Such concerns are in addition to important challenges that still stand in the way of mass AI deployment in vehicles, including known limitations at present on hardware architecture and processing technology, as well as on software algorithms.

For deep learning in automotive to advance further, specific investments will need to be made in both hardware and software areas to overcome current challenges.

Required investments in hardware

The issues requiring further investments in hardware for automotive deep learning relate to four main areas: processing architecture; technology node; memory; and power consumption.

In processing architecture, traditional GPUs will probably lose their long-standing and absolute share of market in running systems—having been the only game in town—given their fixed architecture and limited flexibility. To this end, new integrated circuit solutions must provide faster calculation, higher accuracy and low-power consumption. With the above in mind, new dedicated and optimized neural network hardware accelerators and processor architectures are in the works to revolutionize the AI scenario. Specifically for automotive, Mobileye can be considered a precursor of the path toward a dedicated deep learning solution, optimized for the car industry.


For their part, floating-point processors will probably need more than 80 megabytes cache or SRAM—shadow random access memory—for the storage of millions of parameters, unless compression techniques are used. SRAM requirements are strictly connected to technology scaling and associated costs per wafer, so solutions based on 8-bit wide or even 4-bit wide instructions are under evaluation as a result.

In terms of technology node, 16-nanometer (nm) lithography is likely needed to reach the required processing speed and power performance for automotive deep learning. However, manufacturing at the 16-nm node is extremely costly if developed just for automotive applications, and the return on investment will be in danger if no other adjacent market supports volume. A mask set, for instance, costs about $5 million, which means that developing the SoC is expected to cost about $100 million. Also on their way in the automotive market to increase performance while reducing power consumption are Silicon on Insulator (SOI) technologies and FinFET—field-effect transistors employing so-called fins, to improve power and performance overall.

As to memory and optimization, the problem of density for automotive deep learning systems can be reduced as a first step within the next five years through compression techniques along with optimizing algorithms and processor architecture—for example, by moving down from floating-point processing to 2-bit or even 4-bit precision. More memory might be required as the next step further in the future—but only when network complexity has become much greater, handling several million parameters. Suppliers like California-based Altera also say that solutions in the form of field-programmable gate arrays (FPGA) are expected to evolve rapidly to specifically address deep learning needs, taking advantage of flexible programmable logic.

The fourth area in hardware has to do with power consumption. For instance, individual sensor subsystems, such as the front-view camera or LIDAR module, need to stay within a 4-watt range to operate within budgeting costs. In the case of sensor fusion ECUs, 15-20 watts or more might work, but some OEMs already believe that a trade-off is to be expected here between power and performance if no silicon solution is available.

Required investments in software

In the deployment time frame for deep learning in automotive, software is expected as the first step to be tackled in order to improve today’s performance. Better-performing hardware solutions will not be available until a later point in time, estimated at more than five years out, coming after software improvements.

The investments required in software for automotive deep learning have to do mainly with algorithms. Limitations exist today in the available algorithms, which were developed and optimized for GPUs. For instance, state-of-the-art algorithms are not optimized for input/output (I/O) switching in microcontrollers (MCU). Moreover, on current networks software implementation does not generally have recurrence, a standard memory feature, and there is limited use of data compression.

As a result, new algorithms in the future will not necessarily be optimized for GPU architecture, which has been the normal course of action up until the present. This, in turn, is expected to boost the development of new, deep-learning-specific accelerator and processing devices in the near future.

Most players today also indicate that new algorithms will constitute the breakthrough in artificial intelligence next year, in the process changing the automotive scenario. Here new solutions must allow simple algorithms and math to reduce power consumption and increase performance in machine vision. These new solutions will also impact the decisional process of neural networks, or how the network makes decisions in a range of autonomous driving tasks and scenarios. Overall, systems based on deep learning need to react in less than 100 milliseconds, similar to ADAS mechanisms.

In particular, such new solutions will be required in ISO 26262 certifications for functional safety, which remains the biggest challenge for deep-learning-based systems. At present, it is not clear how ISO 26262 will conduct the validation and certification needed for a “virtual” brain. Nonetheless, simulation, validation and testing are a “must” for deep learning, and the automotive industry will probably need to set a clear standardization procedure for validating and testing in deep learning systems, which will also partly address the certification for functional safety.

The road ahead

Deep learning needs to evolve faster in automotive than the usual pace of development taking place in the traditional automotive supply chain. And for development to move more quickly, partnerships are a must across traditional automotive players and other industries, software companies, universities and startups, so that expertise and technology from other fields besides automotive can be brought in.

In fact, algorithms and software developed for other industries cannot be reused for automotive as they would be obsolete by the time they land in the automotive space—which means the algorithms and software need to be developed from scratch for automotive. Moreover, new microcontroller architecture are required with acceleration specifically designed for artificial intelligence and neural network structures.

As deep learning takes increased hold in automotive, opportunities will arise for monetization and to share costs, and infotainment systems based on deep learning will be the ones best placed to take advantage of what by then will be the predominant trend of connectivity to the cloud.

New business models will also emerge as the autonomous vehicle market gains full speed, and new players beyond vehicle manufacturers will jump in to take advantage and provide new services.

The latest announcement of a collaboration between Nvidia and Chinese web services giant Baidu already showcases such opportunities. The two companies are seeking to bring together the technical capabilities and AI expertise of two world-class AI companies to build a viable car architecture for self-driving cars—“from end to end, from top to bottom, from the cloud to the car,” according to Nvidia’s CEO.

In particular, tagged images, contextual and situational patterns, maps, road conditions and parking-space data will generate a tremendous amount of valuable data for the vehicle. The impact of such data will affect not only the vehicle but also data farms and infrastructure, which will be required to safely store the data and be able to quickly parse the information in order to add value and upgrade vehicle functionality.

The battle, from now on, will be about who owns the data.