Power: A Business Problem
Paul van Besouw, President and CEO, Oasys Design Systems
Power has become the major challenge in delivering electronic
systems that meet the business objectives of the end user and
the semiconductor components company. It is now almost a
cliché to say "power is the new timing."
In portable devices such as wireless phones and tablet computers,
power is clearly important. Power translates directly into parameters
such as standby time and talk time that consumers use in making
product choices. However, power is also important in tethered
applications, including servers and routers. Increasingly, the capacity of
a datacenter is no longer limited by the physical size of the equipment,
but rather by the ability to bring power into the building and transfer
heat out. Even in less aggressive tethered systems such as set-top boxes
(STBs), power is a primary driver of acceptability. STBs sit in people's
living rooms and, typically, should not have noisy fans. Furthermore,
they must continue to work if the cat decides that the top of the warm
digital video recorder (DVR) is the perfect place for a nap.
More subtly, the power budget is the primary limitation on
performance. This is most obvious in the chips from companies
such as Intel, NVIDIA and Advanced Micro Devices (AMD) that
are placed into all types of personal computers (PCs) (i.e., servers,
notebooks, etc). For several years now, it has not been possible
to increase the raw performance of individual cores due to power
reasons, and the number of cores on a chip is often limited as much
by power considerations as area.
The current champion on the power front for individual chips
seems to be IBM's experimental 3D chips with server processors,
memories and an interposer to connect it all, distribute power and
decouple noise. It is all in the same package under a huge heat sink.
These dissipate 1–200 watts (w). With a 1 volt (v) power supply, this
means supplying 100–200 amps. To put that into perspective, that is
the typical current used for arc welding, although at a higher voltage.
This power ceiling constraint has led to more and more chips
delivering computing power not in the form of a single high-performance
processor, but in multiple cores. The cores may be
heterogeneous, with different types of cores and functionality
statically partitioned among them. This has been the case with
cellphones since signal processing logic was replaced with digital
signal processors (DSPs). The alternative is the multi-core symmetrical
multi-processing (SMP) architecture. Here, multiple identical cores
can be dynamically scheduled depending on the workload. However,
the difficulty of programming multi-core designs has been completely
underestimated by processor manufacturers. In the market for servers
(primarily those delivering Website content), the different parallel
users can be spread among the cores and do not require any subtle
analysis. But in single-user systems such as PCs or smartphones,
Amdahl's law comes into effect. When designing parallel computers
in the late 1970s, Gene Amdahl observed that performance was not
limited by what could be made parallel, but by what could not. If,
for example, 10 percent of a workload could not be parallelized, then
the maximum speedup would be asymptotically limited to 10 times
if the parallelized part of the workload ran infinitely fast; and with
practical speedups, this would be significantly less.
Right now this is not an enormous problem since multi-core
designs typically contain only two or four cores, so there is a lot of
scope for finding concurrent functionality. But a corollary of Moore's
law is that the number of cores on a chip is increasing exponentially
too. It is not obvious yet since the number of cores is on the flat part
of the curve. For a general-purpose program, how to make use of
large numbers of cores remains an open question; and the biggest
worry is that this is a question without a solution.
One area where the number of cores has left the flat part of the
curve is the design of graphics processors. These address a so-called
"embarrassingly parallel" application where there is almost arbitrary
amounts of workload that can be run concurrently since there is
very little interaction between different parts of the workload. Even
here, power is one of the big constraints on the amount of graphic
processing that can be delivered. Further, the inability of the main
processor to deliver enough data at a fast rate due to power limitations
means that graphics processors cannot always maintain a full load.
Although there are some other issues such as designing low-power
motors for fans and disk drives, most of these big picture constraints
come down to one—designing low-power ICs for the semiconductor
content of these systems.
In the past, process and chip designers had a big weapon for
reducing power: Reduce the power supply voltage at each process
node. Since the voltage is squared in the power equations, a small
change in voltage can have an outsize effect in reducing power. For
various technical reasons associated with leakage current and noise
immunity, it is no longer possible to reduce the power supply voltage
at each process node to compensate for higher clock frequencies and
higher component counts.
The tools and design approaches for reducing power are somewhat
limited. Power is a chip-level issue. Obviously, there is a sense in which
the power dissipated is the sum of the power of all the blocks (or even
all the gates and interconnect), but this is not very useful since it is
difficult to guess what the power limit for an individual block should
be. There are a number of proven approaches to reducing power, most
of which have some level of support in current design flows:
- Multi-voltage threshold libraries: Since leakage current is an
increasing part of the power problem, it is good practice to use
high-threshold, low-leakage, low-performance cells on non-critical
nets and keep the low-threshold, high-leakage, high-performance
cells for the timing-critical parts of the design. The
synthesis tool can make the selection dynamically, based on the
criticality of the nets.
- Voltage domains: Some parts of the design are much more
critical for power or timing than others, so the design can
be separated into separate voltage domains with different
performance/power tradeoffs. The Common Power Format
(CPF) from Si2 and Accellera's Universal Power Format
(UPF) standards are a way to capture this policy. Power-aware
synthesis can read these files, automatically infer the required
level shifters between voltage domains, and take account of the
impact on timing and power.
- Voltage areas: Though voltage areas can be powered down, this
is something above the level of the system-on-chip (SoC)—
typically controlled by a high level of the control software (Is a
phone call in progress?). Synthesis tools cannot automatically
decide to power down an area, but they can take into account
the timing due to isolation cells described in the CPF or UPF.
- Clock gating: In the distant past, the golden rule was never
to gate a clock. Instead, a register containing an unchanging
value was looped back to its input through a multiplexor.
Today, in the low-power era, that structure is best replaced by
the synthesis tool with a gated clock, especially if the register is
large (since the clock can be gated for the entire register rather
than for each flop in the register).
The challenge with traditional synthesis tools is that there is really
no good way to partition the power budget among the large number
of blocks on a chip that traditional methodologies create for a chip in
a leading-edge process node such as 45 nanometer or 28 nanometer.
Even if a reasonable division of the power budget among the various
parts of the design is found, iterating the design to analyze different
"what-if " scenarios is too slow.
Figure 1. Physical Synthesis

Traditional design flows cannot handle power constraints efficiently.
For example, imagine a block in its voltage island. Lowering the
voltage to that island has the potential to save a lot of power. But it
is not feasible to only lower the voltage and re-run timing analysis.
If the synthesis tool has done a passable job, the block will just make
timing (otherwise slower, lower power cells could have been chosen),
thus the whole block must be re-synthesized at the lower voltage.
Worse than that, the time budgeting should also be updated and
the surrounding blocks re-synthesized. Further, since timing is totally
dependent on placement, especially at the most advanced nodes, the
whole placement of the design needs to be redone to ensure confidence
in the analysis. For a typical design using traditional block-based
approaches, analysis for a single case can take days or even weeks.
Since power is a chip-level problem, it needs to be addressed
at the chip level. Chip synthesis operates at either the level of the
entire chip or over a handful of largely independent large blocks
(microprocessors, memory subsystems, etc). Experimenting with the
voltage of a particular block (voltage island) requires resynthesizing
the whole chip from the register transfer level (RTL) and generating
an updated placement. The ramifications of the voltage change
propagate throughout the chip in an optimal manner. If the analysis
of the change looks good, then updated CPF or UPF needs to be
created either as input to the chip synthesis tool or, if the voltage
change was made interactively inside the tool, written back out.
Figure 2. Chip Level

Since power is a chip-level problem, it needs to be addressed at the chip level with chip synthesis.
One challenge in modern IC design is that chip-level issues need
to be handled at the chip level. Increasingly, approaches that divide
the design into large numbers of blocks to be handled independently
are not efficient because it is not possible to have good budgets
for timing, power, area, congestion and so on between the various
blocks. While each block may be implemented efficiently on its own,
the reassembly of these blocks to form the whole chip is far from
optimal. Additionally, power is the new timing in electronic systems
and has become a business problem.
Figure 3. Chip Synthesis

Chip synthesis is faster and more efficient.
When dealing with chip-level problems such as power at the chip
level, using chip synthesis is the way to escape from this increasingly
ineffective method of design.
About the Author
Paul van Besouw is president, chief executive officer and co-founder of
Oasys. Mr. van Besouw has an extensive technical and management
background in the electronics industry. Prior to Oasys, van Besouw was
responsible for managing the synthesis and physical synthesis teams at
Cadence. Prior to Cadence, he was one of the first members of the Ambit
Design Systems engineering team, where he led the development of the
RTL and datapath synthesis technology. He holds a Master of Science in
electrical and computer engineering from the Eindhoven University of
Technology in the Netherlands.
Back to Articles Home