Mobile phones and tablets are driving eMMC volume, pushing faster speeds and transitioning into UFS and other higher speed interfaces. Other applications such as Ultrabooks and enterprise storage solutions are spurring SSD growth, demanding even greater quality and higher speeds for both ONFi and Toggle NAND interfaces. Market research firm, IC Insights, forecasts that over the next four years the NAND Flash market to have the third-highest revenue-growth rate among all semiconductor segments as well as stronger bit growth than DRAM.

To keep up with market demands, NAND technology is rapidly evolving – in fact, intrinsically shifting. All major NAND manufacturers have begun sub-20nm process migration, further shrinking the basic 2-D NAND memory cell design. Many industry experts believe that 2-D NAND scaling would eventually reach its limit at 10nm process technology. As a result, the industry is adopting novel 3-D NAND lithography processes such as Samsung TCAT, Toshiba BiCS, SK Hynix DC-SF and Macronix BE-SONOS which drastically change the basic memory cell design.

These advances in NAND Flash technology have raised quality and reliability challenges for NAND manufacturers, renewing the need for increased test coverage without hiking up the cost of test and dampening market growth. To deliver the needed performance economically, NAND manufacturers need a test solution with architecture capable of increasing yield and throughput. This paper examines NAND Flash test requirements at the package level and shows how the right ATE architecture can meet them most efficiently.

NAND Flash Needs a Dedicated Test Solution

As with other commodity products, NAND Flash per-bit ASP has been declining dramatically. With the recent industry consolidation, per-bit ASP has stabilized somewhat. Still, Gartner 2Q13 forecast shows a CAGR of -18.7% during 2012-2017. As a result, NAND Flash manufacturers are under constant pressure to reduce cost. One of the ways companies do this is to minimize the cost of test. This places the burden on IC manufacturers to utilize the most economical NAND test solution that delivers high throughput and maximizes yield.

A dedicated tester is necessary because NAND Flash requires different test capabilities that are not commonly found on other memory testers. Most importantly, the tester needs to be optimized for NAND Flash testing and should not contain extraneous capabilities which would increase cost. In this paper, some of these key NAND Flash test capabilities will be covered.

Tester-per-Site Architecture

The architecture with the best throughput for NAND Flash is one that is based on a true tester-per-site architecture where one DUT is tested per test site. In this architecture, there are many test sites in a system to support the required parallelism. Each test site has its own independent test resources, such as, test processor, algorithmic pattern generator, parametric measurement units, buffer memory, fail memory and others. However, this tester design is very expensive and may not offer the lowest cost of test overall. Typically, a tester-per-site architecture testing between two and eight DUTs per site presents the optimal balance of tester cost and throughput, resulting in the lowest overall cost of test.

Figure 1 illustrates the point by comparing two types of tester-per-site architectures in terms of test time and cost of test. Tester A can accommodate smaller number of DUTs per test site than Tester B; therefore, Tester A offers lower test time and lower cost of test.

Figure 1: Comparison of various tester-per-site architectures

Real-time Source-Synchronous Testing

To meet higher data-transfer rate demand by NAND Flash applications, the industry has introduced DDR NAND Flash and managed NAND Flash such as eMMC. These devices are already supporting 200 MT/s and 400 MT/s data-transfer rates, and they will need to support even higher rates in the future. As data-transfer rates increase, source-synchronous capability is added to the NAND interface with the addition of bidirectional DQS signal to better control the data interface timing. This is another test requirement which best can be addressed by selecting the right tester architecture to improve yield and throughput.

One of the key challenges in supporting source-synchronous functionality arises during device data output operations. According to ONFi 3.0 specifications, there are two critical timing parameters:

tDQSRE and tDQSQ. These timing parameters have very wide ranges and are extremely sensitive to process-voltage-temperature (PVT). Most existing testers do not have real-time source-synchronous support. Therefore, users typically rely on a process called “training” to determine these parameters. Training is a time consuming operation in which a simplified read pattern is looped while strobe-timing edges are finely adjusted over a wide range until the desired result is found. Because both of these timing parameters are sensitive to PVT, training needs to be done for every DUT whenever voltage conditions are modified in the test flow. As a result, significant test-time overhead can be incurred. Furthermore, device operation can change the device temperature, and together with jitter, they affect the timing parameters during the test which could not be resolved with training. If a tester does not have a source-synchronous function to automatically compensate real-time, then yield loss is possible, especially at data-transfer rates higher than 400 MT/s where data eye is significantly reduced.

For a tester with real-time source-synchronous function similar to Figure 2, it would be able to automatically compensate for the tDQSRE and tDQSQ dispersions across multiple devices on a cycle-by-cycle basis, despite changing PVT. The net result is test execution with guaranteed data eye which maximizes yield when testing at data rates higher than 400 MT/s. Furthermore, the tester would be able to eliminate all related training operations significantly reducing test time.

Figure 2: Real-time source-synchronous function

To meet the ever-increasing demand for higher bit growth by the NAND market, manufacturers have advanced to sub-20nm process lithography, raising the number of bits per cell. At 10nm process technology, a Floating-Gate NAND memory cell holds as few as 10 electrons according to the trend. With 3-bits-per-cell enabled, the data stored in such memory cell changes by just adding or removing one to two electrons in the floating gate. Also, they are adopting novel 3-D lithography processes. All of these changes affect the quality and reliability of NAND Flash, which must meet the performance demands for SSDs in Ultrabooks and enterprise storage solutions. As a result, NAND manufacturers are more dependent upon increasing ECC capabilities, which can be addressed by the tester architecture. Most of today’s testers do not offer real-time ECC support.

Therefore, users typically are required to perform a post-processing operation to determine ECC solutions for every DUT. This operation is time consuming and results in significant test-time overhead, especially, for test flows that perform many read operations with ECC analysis enabled. Also, it requires significant tester resources including a large error-capture RAM to store fail bitmap of every DUT and high-performance test processors to analyze those fail bitmaps and come up with ECC solutions. The net result is significant increase in cost of test for supporting ECC analysis.

However, it is possible to implement a low-cost ECC analysis function with on-the-fly analysis to achieve lower cost of test. This ECC analysis function would have the following attributes:

  • On-the-fly ECC analysis: ECC analysis is done on-the-fly across all DUTs, eliminating all related test-time overhead. It also reduces tester cost because it does not require expensive tester resources such as a large error-capture RAM and many high-performance test processors.
  • Flexibility: NAND manufacturers sell their raw NAND to various customers, each with its own proprietary controller and ECC algorithm supporting different data organization. The solution must be flexible to support different ECC algorithms.
  • Concurrent execution of multiple ECC algorithms: For the same reason as above, it is beneficial to support concurrent execution of multiple ECC algorithms to save test time.
  • Support of ECC grading: Besides providing pass/fail status, it also is advantageous to have the capability to store fail counts per ECC sector so that the data can be analyzed, graded and binned as necessary. This gives NAND manufacturers the flexibility to sell their devices to customers that could support different thresholds of fail-bits per ECC sector, thus, allowing manufacturers to improve profit by selling devices that would otherwise be rejected or binned with lower densities.

High-Current PPS for Higher Throughput

To achieve higher densities and integration needed by mobile applications, NAND manufacturers are stacking as many as 16 dies per package. Testing all the bits in a package with capacity nearing 1 Terabit or higher in the future requires very long test time which directly increases the cost of test. One of the techniques to reduce test time is to perform concurrent testing of all dies in a package, especially, for long test time items such as array program and erase. However, this requires a significant amount of power supply current per DUT. Using a tester with scalable power supply architecture which enables flexible PPS pin-count per DUT can reduce test time as well as cost of test.

Figure 3 shows test time impact for array program/erase operation among testers with different PPS pin-count per DUT—test time saving is significant for testers with high PPS pin-count per DUT especially as more dies are stacked in a package.

Figure 3: Test time for parallel NAND program/erase operations versus number of PPS per DUT

Many existing testers do not have PPS resource with the required voltage and ICC current to support concurrent array program/erase operation for up to 16 dies per package. In such cases, some testers allow ganging of two or more PPS channels to increase the ICC current. Unfortunately, the PPS pin-count is fixed per system on most existing testers. As a result, either test time must be extended to allow for serialization of the array program and erase operations for all dies in the package, or parallelism must be reduced to allow for more PPS channels per DUT. Either way, it results in lower throughput which increases cost of test.

On the other hand, a tester with a scalable PPS architecture can allow for flexible numbers of PPS channels per DUT enabling a significantly lower test time as well as cost of test.

Conclusion

The NAND Flash market is expected to continue growing for the foreseeable future, driven by expanding use in mobile phones, tablets and SSDs. These applications will require continued increases in density and performance while demanding higher quality and reliability. To keep pace, NAND manufacturers are developing innovative, new NAND Flash technologies, requiring more test coverage while driving up the cost of test. In order to manage testing costs, NAND manufacturers need a dedicated NAND Flash test solution with a scalable tester-per-site architecture, optimized AC performance, and other NAND-specific features to increase yield and improve throughput.