The industry has been enjoying the fastest growth in the automotive market in the last 5+ years. Semiconductors in automobiles are trending towards increased complexity with the rise of connectivity, smart functions and sophisticated safety devices. This increasing device complexity poses challenges to the industry since it is also pushing the limits to manufacturability and defect detection.
There are reports of alarming trends in electrical/electronic field failures that require attention. Stout, Risius and Ross reported a surge in the EWRs (Early Warning Reports) related to electricals in the last 10 years ending in 2014, see chart from the SRR report in Figure 1. EWRs are field incidents that may involve injury or death and are reported by carmakers to the NHTSA (National Highway and Transportation Safety Administration). EWRs trigger investigations that could turn into recalls. In addition to the SRR report, a joint paper published by Taylor Johnson from the Department of Computer Science and Engineering, University of Texas at Arlington, Raghunath Gannamaraju and Sebastian Fischmeister, Department of Electrical and Computer Engineering, University of Waterloo documents a more detailed study of data from NHTSA and the Canadian and European counterparts Transport Canada and RAPEX (Rapid Alert System for non-food dangerous products). This academic paper confirms the clear upward trend in the number of electronic/electrical hazard and risk-related notifications in the operation of motor vehicles. It also links this trend to the increasing number of ECUs (electronic control units) with approximately 50-75 ECUs in modern cars making them “truly distributed computer systems running on wheels”.
Figure 1: Chart from the Stout, Risius and Ross 2015 Automotive Warranty Recall report showing a surge in the electrical EWRs in the 10 years ending in 2014.
In the last few decades, the automotive industry has established the most stringent quality management standard (ISO/TS16949), core tools (FMEA, APQP, PPAP, MSA and SPC), electronic qualification standards (AEC-Q100 and family) and more recently the ISO26262 standard (Road Vehicles Functional Safety). But the increasing IC and process complexity, rise in the number of ECUs in the car along with the surge in electronic-related incidents requires the industry to utilize all powerful means at their disposal to proactively prevent field failures. Part of the solution is in the huge amount of data that fabs and test houses continue to rapidly accumulate. The tremendous potential in this data can be unlocked with advanced analytics.
A novel analytics technique for detecting potential field failures is the composite distance technique. The method predicts marginalities related to many variables, a condition that is undetected by current single parameter screens or SPC. The tens of thousands of variables in today’s complex designs and equally complex process technologies can only be handled by advanced analytics. Conventional techniques result in inaccurate models, a condition known the “curse of dimensionality”.
The composite distance algorithm first sifts through the tens of thousands of variables and identifies those with the greatest influence on the marginality. The composite distance is derived from these key variables. It is a measure of the degree of anomaly of the marginal product. When this composite parameter is deployed in an inline control chart with WAT, sort, Final test or fab process parameter data, it identifies the parts with high probability of failing in the field. In multiple studies worldwide, the technique has been proven to detect actual field failures in several leading semiconductor and electronic manufacturers. The field failure data involved top carmakers and tier 1 auto suppliers in the US and Europe. See chart in Figure 2 and 3 illustrating how actual field failures were detected by the technique.
Figure 2: Two highlighted units are actual field failures that passed specification limits in a semiconductor company. They failed at early life at the tier 1 and the carmaker. The Composite Distance chart shows that they are detected by the algorithm as outliers.
Shown in Figure 3 are charts that compare the accuracy and purity between the Composite Distance and Single Parameter control charts using the top parameter of importance.
Top: Composite distance chart for an electronic manufacturer. Red points above the red limit line are actual automotive field failures.
Bottom: The topmost parameter is not able to catch the field failures which are all below the red spec limit. If the limit line is lowered (blue line) in order to catch a portion of the field failures, more units have to be rejected. The composite distance method allows better detection of field failures without rejecting too many good units.
The composite distance technique can be employed throughout the product lifecycle. During the Design and Development phase, it can be used to detect anomalous parts from supplier production or qualification data. This provides a more stringent control on supplier qualifications. During high volume production, it can be used on critical material or component suppliers to ensure removal of anomalous material from the semiconductor or electronic manufacturing. Its inline deployment as a process control and screening method is a must for automotive Safe Launch to ensure rapid elimination of potential field failures before the start of the high volume production. Automotive Safe Launch is a period where data collection is significantly greater than standard production. It aims to identify and rapidly eliminate failure mechanisms that may impact the high volume production. Anomaly detection via the composite distance technique ensures the rapid reduction of field failures that typically occurs as high volume ramp commences.
In general, the physics and industry evidence on field failures whether early life or operating life failures indicates the failures to be marginal product. The studies on the composite distance and the recommended deployment strategies above provide a powerful method for screening potential reliability failures without the need for costly burn-in. The technique also enables manufacturers to increase defect coverage for many products with the same algorithm without the need for costly test development and hardware.
The financial impact of automotive field failures on semiconductor companies can be prohibitive. There is widespread awareness of the billions of dollars incurred by the carmakers and tier 1 suppliers in regulatory fines and litigation costs for recalls such as the GM ignition switch, Takata airbag and Toyota sudden acceleration issues. For semiconductor companies, there are significant internal costs of engineering and administrative man-hours for automotive returns handling, failure analysis and corrective action. A cost analysis of field returns and quality issues for an actual device typical of today’s complex designs includes 91 devices returned in the first year of life. The returns from a top carmaker and tier 1 supplier incurred around $1.7M in technical and administrative personnel time spent on handling, failure analysis, corrective action and customer quality issue resolution for these failures. This excludes additional costs which the quality agreements from tier 1 customers require the suppliers to bear. These additional costs include the tier 1 supplier failure handling and analysis time and the OEM’s cost of removing the failures from cars. The composite distance analysis has been shown to detect as much as 70% of actual field failures. Thus the deployment of the method as inline screen results in an estimated savings of around $1M per device in the first year of life. Furthermore, the deployment of the technique enables the achievement of a minimum of 300% ROI within the first year.
The use of advanced analytics enables automotive semiconductor suppliers to rapidly achieve zero defect and prevent safety incidents due to the trend towards complex functionality and fab technology. It also enables huge cost savings from failure analysis, returns handling, customer issue resolution, corrective action and other mitigation activities such as test development and burn-in. Finally, it also derives additional ROI on the IT infrastructure investments on big data as it taps the hidden potential of the data that already exists in the semiconductor supply chain.