Identifying Optimal Non-volatile
Semiconductor Memory for Use in RAID
Systems
Barry Hoberman, Business Development, Crocus Technology
Steve Cliadakis, Business Development, Crocus Technology & Silicon Impact
Redundant array of inexpensive disk (RAID) systems increase
server performance and protect against data loss by exploiting
disk-level parallelism. Without RAID servers, there would
essentially be no commercial Internet. Because they hold mission-critical
data and must provide fault-tolerant storage, enterprise
customers cannot tolerate any potential for RAID server data loss in
a world of less-than-perfect electrical power sources. Consequently,
RAID system designers have tried using many non-volatile
semiconductor memories, including NAND Flash, non-volatile static
random access memory (NVSRAM), ferroelectric RAM (FeRAM or
FRAM) and magnetoresistive RAM (MRAM), to retain data in the
event of a power failure. Of these, MRAM comes closest to the ideal
memory for RAID system server design.
The commercial behemoth known as the World Wide Web sits
layered on an intricate, distributed data network called the Internet.
System architects build and extend the Internet's very foundation
using storage blocks called hard disk drives (HDDs). Shelves full
of HDDs organized into RAID systems inhabit countless racks
in worldwide data centers. The improved performance and fault
tolerance of RAID systems explain their universal acceptance as
the local and networked storage medium of choice for all types of
Internet servers, other on-line transaction processing (OLTP) servers,
and many other server types installed in large and small data centers
worldwide. According to estimates and forecasts, several million of
these servers ship each year.
RAID systems use sophisticated, fault-tolerant methods of data
storage, including data mirroring and striping, to distribute data
across multiple drives and thus protect data against loss. These
methods make automated recovery possible when an individual
HDD fails. Real-time data recovery, possible with more advanced
hardware RAID systems, ensures that important and mission-critical
data remains safe even after catastrophic hardware failure. Although
a RAID system's disk drives provide much of the non-volatile data
storage needed to provide fault-tolerant operation and data recovery
ability, there is a real need for non-volatile semiconductor memory
within the design of the RAID controller. Without non-volatile chip
memory, a RAID system cannot protect data against certain types of
failures such as power loss.
Figure 1 shows a block diagram of a RAID system. The RAID
server's heart is the RAID control processor, which manages the
attached drive array through a bank of industry-standard serial
attached SCSI (SAS) and serial ATA (SATA) drive interfaces. Although
largely based on HDDs, RAID systems increasingly include one or
more solid-state drives (SSDs), which themselves are based on arrays
of NAND Flash semiconductor memories. SSDs provide roughly
10x faster write performance and 100x faster read performance
than HDDs, but they cost substantially more per gigabyte (GB) of
storage. Hybrid RAID systems that combine HDD and SSD storage
currently offer the best available mix of performance and capacity.
Note that Figure 1 shows a superset of all possible methods used to
provide non-volatile storage in the server (shown as colored boxes).
Practical RAID designs use some but not all of these methods.
Figure 1. Block Diagram of a RAID Server

The RAID control processor currently requires a mix of
semiconductor memory, including DRAM and non-volatile
semiconductor memory. Local non-volatile memory usually serves as
a repository for the hardware RAID server's firmware and as non-volatile
storage of configuration data and of a journal/transaction/error log file. The right side of Figure 1 shows a large box labeled
"Primary RAID Memory Cache." This cache speeds disk write
transactions from the host server's perspective. The RAID controller
can quickly stash write transactions in fast memory cache and then
signal transaction completion to the host. Then, the RAID controller
moves the transaction data from the primary cache into the disk array,
which is a relatively slow process compared to saving the transaction
in the cache.
A large DRAM bank serves as the primary cache in most
RAID servers because DRAM currently provides the best available
combination of fast write time and low cost/bit. Low per-bit cost
is important because RAID memory caches are sometimes as large
as 32GB. Today, this DRAM is most likely to be double data rate,
second generation synchronous dynamic random access memory
(DDR2 SDRAM), which will quickly transition to DDR3 SDRAM
this year as sales volumes and semiconductor memory economics
start to favor DDR3 SDRAM over DDR2.
However, DRAM has a severe liability when used as RAID
memory cache: DRAM provides only volatile storage. If power
is lost, so is the data stored in the DRAM. Because RAID systems
hold mission-critical data and must provide fault-tolerant storage,
enterprise customers cannot tolerate this potential for data loss in
a world of less-than-perfect electrical power sources. Consequently,
RAID designers employ one of several methods to add non-volatile
storage to DRAM-based primary caches.
The first such method is to simply add a battery and power
controller to maintain power to the primary cache when power mains
fail. However, most battery systems used in RAID applications are
rated for no more than 72 hours of unpowered operation. After that,
data may be lost. Batteries also require maintenance. RAID back-up
batteries should be replaced annually, which is both an extra expense
and a potentially serious operational problem. It's not uncommon for
data center managers to be blissfully unaware that their RAID servers
contain deeply embedded batteries. Consequently, many RAID
back-up batteries are not serviced regularly, and mission-critical data
is at risk.
Figure 1 shows an alternative design approach—adding NAND
Flash memories to the primary cache—which also provides non-volatile
storage for the RAID system's primary cache. When the
RAID control processor detects a loss of main power, an inexpensive
back-up control processor in the primary cache independently copies
the contents of the cache's DRAM to the NAND Flash array. NAND
Flash is generally rated to safely hold the data for 10 years without
power. Battery power is only required for a short period while the
data is copied from DRAM to NAND Flash. Some designs dispense
with the battery and the associated maintenance requirements
and instead use low-maintenance ultra capacitors, which provide
the needed power for the short back-up interval. Using Flash as a
back-up memory layer in this configuration adds the cost of the
Flash memory itself, as well as the supporting back-up circuitry and
hardware, to the RAID system.
Two key characteristics prevent NAND Flash from being used
as the sole memory in primary RAID caches. First, NAND Flash
devices have relatively long write latencies due to their long erase-write
cycles. Second, NAND Flash devices deteriorate in direct
proportion to the number of erase-write cycles they endure. Most
NAND Flash devices are rated for only 100,000 or so erase-write
cycles before the serious onset of memory cell failures. Wear-leveling
techniques remediate NAND Flash wearout failures in SSD
applications, but these techniques are too slow to apply to a primary
RAID cache, which requires data throughput rates that are orders of
magnitude faster. These two traits greatly reduce the attractiveness
of NAND Flash for direct primary cache storage. For these reasons,
NAND Flash can serve as a DRAM back-up in the primary RAID
cache but cannot serve as the primary cache's main memory alone.
There's a significant opportunity to replace DRAM in the primary
RAID cache if a cost-competitive, non-volatile semiconductor
memory with DRAM's write speed and without NAND Flash's write
endurance problem becomes available.
Beyond the primary RAID memory cache, two other places in
the RAID server block diagram require non-volatile memory—for firmware storage and for the journal/transaction/error log file.
Use of non-volatile memory such as read-only memory (ROM),
electronically programmable ROM (EPROM) and NOR Flash for
firmware storage is pervasive in most embedded systems, including
RAID servers. However, the log file is unique to storage applications.
Journaling file systems employ techniques from transaction
processing database systems to maintain the structural consistency
of the data stored in the RAID array by logging atomic disk input/output (I/O) transactions. Should a failure occur such as the loss of
a drive in the disk array, replaying the transaction log from the last
file system checkpoint restores the RAID system's state. Depending
on the checkpoint frequency, the journal/transaction/error log file
need not be nearly as large as the primary RAID memory cache. A
few megabytes of storage are generally sufficient for both the journal/transaction/error log file and the RAID system's configuration data.
For obvious reasons, the memory that holds these files must be non-volatile.
Several semiconductor memory technologies vie for this socket:
NVSRAM pairs a six-transistor (6T) static RAM cell with a
silicon-oxide-nitride-oxide-silicon (SONOS) electrically erasable
programmable ROM (EEPROM) cell, replicating the SDRAM/NAND Flash pairing previously described but at the cell level. The
result is a fast SRAM array that can be backed up in one write cycle.
NVSRAM is currently the technology of choice for the non-volatile
memory in RAID systems (excluding the primary memory cache).
However, the NVSRAM memory cell is more than twice as large as a
6T SRAM cell, which itself is relatively large compared to DRAM or
NAND Flash memory. Consequently, NVSRAM storage is relatively
expensive on a cost/bit basis and will likely stay that way relative to
other memory technologies.
FRAM inserts a ferroelectric material, typically lead zirconate
titanate (PZT), into the semiconductor processing flow. The
ferroelectric material creates a bistable bit storage element that
operates at the molecular level. An electric field sets the physical position of one central atom trapped in a tetrahedron of oxygen
atoms. The central atom's position represents the stored bit state
within the PZT ferroelectric molecule. Commercial FRAMs have
been available for at least two decades, but like NAND Flash
memories, FRAMs also exhibit wearout failure. More importantly,
FRAM capacities remain small, and lithographic scaling may become
a severe problem because ferroelectric materials tend to lose their
ferroelectric properties when the amount of ferroelectric material
used drops below a threshold value.
Phase-change memory (PCM) first appeared as a cover story in
Electronics magazine nearly 40 years ago, but very few commercial
PCM devices have been introduced so far. PCM stores bits as
physical state changes in a chalcogenide glass that can take either
a crystalline or amorphous form. In crystalline form, chalcogenide
glass is a good electrical conductor. In the amorphous form, it's
not. The conductivity difference produces a usable memory cell.
Chalcogenide glass is the active material used for making recordable
CDs and DVDs, so its crystalline/amorphous properties are very well
understood by now. However, writing to a PCM cell literally involves
melting and annealing glass, so PCM write cycles aren't particularly
fast (approximately 100 microseconds at current lithographies); and
PCM storage retention drops quickly as the operating temperature
rises. Retention time for one vendor's prototype PCM cells is on
the order of 10 years at 85°C but only 10 seconds at 165°C and 10
microseconds at 225°C. Similar to NAND Flash and FRAMs, cycling
stresses cause wearout failure in PCMs, which have endurance ratings
of approximately 108 write cycles.
MRAM stores data in magnetic material introduced into the
semiconductor cell. MRAM's big advantages are density, speed,
symmetrical read and write cycle times, and infinite write endurance.
The MRAM storage element, called a magnetic tunnel junction
(MTJ), consists of a sandwich of one fixed (or "pinned") magnetic
layer and one switchable magnetic layer separated by an insulating
layer to form a tunnel junction, as shown in Figure 2. Write currents
switch the magnetic orientation of the switchable layer with a
measurably different junction resistance depending on whether the
magnetic polarities of the fixed and switchable layers are aligned or
are opposed. The difference in resistance provides the readout of the
cell's state. Memory chips manufactured in first-generation MRAM
technologies are already available in the market.
Figure 2. MRAM Cell Design

Several memory vendors are also currently developing a new and
different sort of MRAM technology dubbed spin torque transfer
(STT). The STT memory cell employs special layers that uniformly
polarize the spin of the electrons flowing through the MTJ, and
the spin-polarized current imparts magnetic moment to the storage
layer depending on the direction of electron flow through the cell.
Significantly, an STT memory cell's write current shrinks with the
square of the linear lithographic dimension of the MTJ, which
suggests that shrinking lithographies will allow STT MRAMs to
achieve NOR Flash memory densities and perhaps even approach
single-level cell (SLC) NAND Flash memory densities. Note that
MRAM has infinite write endurance, unlike competing non-volatile
semiconductor memory technologies. Commercial STT MRAMs
should be available within the next few years.
Table 1 compares significant attributes of various volatile and
non-volatile memory technologies.
Table 1. Attributes of Volatile and Non-volatile Memory
Technologies

If the RAID server's primary memory cache—currently
implemented with volatile DRAM—were inherently non-volatile,
there would be no need for smaller non-volatile memories to hold
the RAID server's firmware, configuration data and the journal log
file. NAND Flash could serve as the only memory a RAID server
needed if it had faster read and write cycle times and if it were not
susceptible to write endurance failures. Currently available and
soon-to-be available MRAMs are already the best candidates for the
RAID server's journal and log files. STT MRAM, when it becomes
available, will have all the required attributes needed to serve all non-volatile
memory functions in the RAID server, including the primary
RAID cache.
About the Authors
Barry Hoberman has held management positions at several technology companies,
including founder and chief executive officer of inSilicon (now part of Synopsys)
and chief executive officer of Virtual Silicon. His primary focus is in strategy
and business development for semiconductors, semiconductor manufacturing and
semiconductor intellectual property(IP). He has 13 U.S. patents and holds two
B.S. degrees from the Massachusetts Institute of Technology. You can reach Barry
Hoberman at bhoberman@crocus-technology.com.
Steve Cliadakis has over 20 years experience in business development, marketing
and product development for technology companies, with a concentration in
semiconductors and IP. He is the founder of Silicon Impact, providing business
development and strategy services for start-ups and well-established companies.
Cliadakis holds a B.E. in electrical engineering from the State University of New
York at Stony Brook and an MBA from Adelphi University in New York. You can
reach Steve at steve@siliconimpact.com.
Back to Articles Home