GSA Forum GSA Forum Homepage
Articles AdvertisementsGlobalFoundries

Power: A Business Problem

Paul van Besouw, President and CEO, Oasys Design Systems

Power has become the major challenge in delivering electronic systems that meet the business objectives of the end user and the semiconductor components company. It is now almost a cliché to say "power is the new timing."

In portable devices such as wireless phones and tablet computers, power is clearly important. Power translates directly into parameters such as standby time and talk time that consumers use in making product choices. However, power is also important in tethered applications, including servers and routers. Increasingly, the capacity of a datacenter is no longer limited by the physical size of the equipment, but rather by the ability to bring power into the building and transfer heat out. Even in less aggressive tethered systems such as set-top boxes (STBs), power is a primary driver of acceptability. STBs sit in people's living rooms and, typically, should not have noisy fans. Furthermore, they must continue to work if the cat decides that the top of the warm digital video recorder (DVR) is the perfect place for a nap.

More subtly, the power budget is the primary limitation on performance. This is most obvious in the chips from companies such as Intel, NVIDIA and Advanced Micro Devices (AMD) that are placed into all types of personal computers (PCs) (i.e., servers, notebooks, etc). For several years now, it has not been possible to increase the raw performance of individual cores due to power reasons, and the number of cores on a chip is often limited as much by power considerations as area.

The current champion on the power front for individual chips seems to be IBM's experimental 3D chips with server processors, memories and an interposer to connect it all, distribute power and decouple noise. It is all in the same package under a huge heat sink. These dissipate 1–200 watts (w). With a 1 volt (v) power supply, this means supplying 100–200 amps. To put that into perspective, that is the typical current used for arc welding, although at a higher voltage.

This power ceiling constraint has led to more and more chips delivering computing power not in the form of a single high-performance processor, but in multiple cores. The cores may be heterogeneous, with different types of cores and functionality statically partitioned among them. This has been the case with cellphones since signal processing logic was replaced with digital signal processors (DSPs). The alternative is the multi-core symmetrical multi-processing (SMP) architecture. Here, multiple identical cores can be dynamically scheduled depending on the workload. However, the difficulty of programming multi-core designs has been completely underestimated by processor manufacturers. In the market for servers (primarily those delivering Website content), the different parallel users can be spread among the cores and do not require any subtle analysis. But in single-user systems such as PCs or smartphones, Amdahl's law comes into effect. When designing parallel computers in the late 1970s, Gene Amdahl observed that performance was not limited by what could be made parallel, but by what could not. If, for example, 10 percent of a workload could not be parallelized, then the maximum speedup would be asymptotically limited to 10 times if the parallelized part of the workload ran infinitely fast; and with practical speedups, this would be significantly less.

Right now this is not an enormous problem since multi-core designs typically contain only two or four cores, so there is a lot of scope for finding concurrent functionality. But a corollary of Moore's law is that the number of cores on a chip is increasing exponentially too. It is not obvious yet since the number of cores is on the flat part of the curve. For a general-purpose program, how to make use of large numbers of cores remains an open question; and the biggest worry is that this is a question without a solution.

One area where the number of cores has left the flat part of the curve is the design of graphics processors. These address a so-called "embarrassingly parallel" application where there is almost arbitrary amounts of workload that can be run concurrently since there is very little interaction between different parts of the workload. Even here, power is one of the big constraints on the amount of graphic processing that can be delivered. Further, the inability of the main processor to deliver enough data at a fast rate due to power limitations means that graphics processors cannot always maintain a full load. Although there are some other issues such as designing low-power motors for fans and disk drives, most of these big picture constraints come down to one—designing low-power ICs for the semiconductor content of these systems.

In the past, process and chip designers had a big weapon for reducing power: Reduce the power supply voltage at each process node. Since the voltage is squared in the power equations, a small change in voltage can have an outsize effect in reducing power. For various technical reasons associated with leakage current and noise immunity, it is no longer possible to reduce the power supply voltage at each process node to compensate for higher clock frequencies and higher component counts.

The tools and design approaches for reducing power are somewhat limited. Power is a chip-level issue. Obviously, there is a sense in which the power dissipated is the sum of the power of all the blocks (or even all the gates and interconnect), but this is not very useful since it is difficult to guess what the power limit for an individual block should be. There are a number of proven approaches to reducing power, most of which have some level of support in current design flows:

  • Multi-voltage threshold libraries: Since leakage current is an increasing part of the power problem, it is good practice to use high-threshold, low-leakage, low-performance cells on non-critical nets and keep the low-threshold, high-leakage, high-performance cells for the timing-critical parts of the design. The synthesis tool can make the selection dynamically, based on the criticality of the nets.
  • Voltage domains: Some parts of the design are much more critical for power or timing than others, so the design can be separated into separate voltage domains with different performance/power tradeoffs. The Common Power Format (CPF) from Si2 and Accellera's Universal Power Format (UPF) standards are a way to capture this policy. Power-aware synthesis can read these files, automatically infer the required level shifters between voltage domains, and take account of the impact on timing and power.
  • Voltage areas: Though voltage areas can be powered down, this is something above the level of the system-on-chip (SoC)— typically controlled by a high level of the control software (Is a phone call in progress?). Synthesis tools cannot automatically decide to power down an area, but they can take into account the timing due to isolation cells described in the CPF or UPF.
  • Clock gating: In the distant past, the golden rule was never to gate a clock. Instead, a register containing an unchanging value was looped back to its input through a multiplexor. Today, in the low-power era, that structure is best replaced by the synthesis tool with a gated clock, especially if the register is large (since the clock can be gated for the entire register rather than for each flop in the register).

The challenge with traditional synthesis tools is that there is really no good way to partition the power budget among the large number of blocks on a chip that traditional methodologies create for a chip in a leading-edge process node such as 45 nanometer or 28 nanometer. Even if a reasonable division of the power budget among the various parts of the design is found, iterating the design to analyze different "what-if " scenarios is too slow.

Figure 1. Physical Synthesis

Traditional design flows cannot handle power constraints efficiently.

For example, imagine a block in its voltage island. Lowering the voltage to that island has the potential to save a lot of power. But it is not feasible to only lower the voltage and re-run timing analysis. If the synthesis tool has done a passable job, the block will just make timing (otherwise slower, lower power cells could have been chosen), thus the whole block must be re-synthesized at the lower voltage.

Worse than that, the time budgeting should also be updated and the surrounding blocks re-synthesized. Further, since timing is totally dependent on placement, especially at the most advanced nodes, the whole placement of the design needs to be redone to ensure confidence in the analysis. For a typical design using traditional block-based approaches, analysis for a single case can take days or even weeks.

Since power is a chip-level problem, it needs to be addressed at the chip level. Chip synthesis operates at either the level of the entire chip or over a handful of largely independent large blocks (microprocessors, memory subsystems, etc). Experimenting with the voltage of a particular block (voltage island) requires resynthesizing the whole chip from the register transfer level (RTL) and generating an updated placement. The ramifications of the voltage change propagate throughout the chip in an optimal manner. If the analysis of the change looks good, then updated CPF or UPF needs to be created either as input to the chip synthesis tool or, if the voltage change was made interactively inside the tool, written back out.

Figure 2. Chip Level

Since power is a chip-level problem, it needs to be addressed at the chip level with chip synthesis.

One challenge in modern IC design is that chip-level issues need to be handled at the chip level. Increasingly, approaches that divide the design into large numbers of blocks to be handled independently are not efficient because it is not possible to have good budgets for timing, power, area, congestion and so on between the various blocks. While each block may be implemented efficiently on its own, the reassembly of these blocks to form the whole chip is far from optimal. Additionally, power is the new timing in electronic systems and has become a business problem.

Figure 3. Chip Synthesis

Chip synthesis is faster and more efficient.

When dealing with chip-level problems such as power at the chip level, using chip synthesis is the way to escape from this increasingly ineffective method of design.

About the Author

Paul van Besouw is president, chief executive officer and co-founder of Oasys. Mr. van Besouw has an extensive technical and management background in the electronics industry. Prior to Oasys, van Besouw was responsible for managing the synthesis and physical synthesis teams at Cadence. Prior to Cadence, he was one of the first members of the Ambit Design Systems engineering team, where he led the development of the RTL and datapath synthesis technology. He holds a Master of Science in electrical and computer engineering from the Eindhoven University of Technology in the Netherlands.

Back to Articles Home

Advertisements
TSMC
Forum Home | Articles | Industry Reflections | Global Trends & Insights | Private Showing | Innovator Spotlight | Forum Archives | GSA Home