GSA Forum GSA Forum Homepage
Articles AdvertisementsTSMC

New Benchmarks for Communications Processors

William McDonald, Director, CPE and Optical Product Marketing, TranSwitch Corporation

Currently deployed broadband access gateways usually support an ADSL2/2+ wide area network (WAN) interface, multiple Fast Ethernet local area network (LAN) interfaces and Institute of Electrical and Electronics Engineers (IEEE) 802.11a/b/g Wi-Fi. Typical maximum data rates for an ADSL2/2+ line are 24Mbps downstream and 1.4Mbps upstream. IEEE 802.11a/b/g Wi-Fi typically supports a maximum raw data rate of 54Mbps or about 19Mbps net throughput. As a result, these gateways are optimized for sub-100Mbps data throughput performance.

With the advent of higher bandwidth broadband access technologies, such as very high-speed digital subscriber line (VDSL), passive optical network (PON), IEEE 802.11n Wi-Fi and 3G/4G wireless, and the rollout of triple- and quad-play services, end-user demand for bandwidth has increased markedly, from tens to hundreds of megabits. This increase in demand for bandwidth has spurred the industry to introduce a new generation of communications processors that support packet processing at gigabit rates. This article identifies the features that these processors must support to meet these new bandwidth demands, while simultaneously complying with seemingly conflicting requirements for reduced power consumption and bill of materials (BOM) cost.

Performance by Design

A gigabit-rate communications processor's architecture and components, including its central processing unit (CPU) cores, on-chip memory, hardware acceleration engines, bus structure and external memory controller, must not only be selected based on performance, but also for minimal cost and power consumption. As a result, these selections require the designer to understand the tradeoffs associated with these features.

At the surface, it may seem obvious that the price that one must pay for higher performance is higher power consumption (i.e., a CPU core that supports hundreds of megabits of data throughput performance would consume ten times the power of a CPU core that supports tens of megabits of data throughput). However, this isn't necessarily the case; all CPUs aren't created equal! Single-issue CPUs, such as the ARM11, support the execution of only one instruction per clock cycle; dual-issue CPUs, such as the MIPS64K, support two instructions per clock cycle; and triple-issue CPUs, such as the Tensilica Xtensa LX2 CPU, support three instructions per clock cycle. Because of this, for a given clock rate, the Xtensa LX2 can support three times the performance of an ARM11 while consuming a comparable amount of power!

Routers typically support what is referred to as “fast-path” and “slow-path” processing. A packet whose next hop address is unknown must wait for the shortest path decision to be made by the router before it can be forwarded. However, a packet whose next hop address is already populated in the routing table can be forwarded immediately. As a result, a packet with a known next hop address can be forwarded faster than a packet whose next hop address is unknown, or take the “fast path.” Conversely, a packet with an unknown next hop address must take the “slow path.” A dual-core architecture with shared memory is a good choice for a communications processor acting as a gateway router since it can support both of these processes simultaneously from different CPU cores. One CPU can be dedicated to fast-path processing while the other CPU can be used for slow-path processing, as well as local and remote management and other applications.

The fast-path CPU must be optimized for maximum packet throughput while keeping cost and power consumption in check. Because of this, attention must be paid not only to the design of the CPU core itself, but also to the size of the fast-path code base relative to the size of the L1 cache. In other words, the L1 cache must be sized to accommodate the entire fast-path code base so that no accesses to external memory are required other than the retrieval of the header of the next packet to be processed. External memory accesses delay the execution of the fast-path code, resulting in a reduction in packet throughput. It should be noted, however, that on-chip memory is expensive, and as such, the size of the L1 cache must be kept small, preferably 16KB to 32KB in size. These objectives can be met through the use of reduced size instructions (e.g., 16-bit and/or 24-bit versus 32-bit instructions) which enable the realization of up to two times the code density of traditional CPU designs.

The “Holy Grail” of fast-path CPU performance is gigabit-rate, wire-speed throughput for 64-byte Ethernet frames. However, it is unclear whether this level of performance is really needed since the average Internet frame size is somewhere between 256 and 512 bytes, and the cost of this level of performance is prohibitive for customer premises equipment (CPE). As a result, service providers and end users typically specify gigabit-rate, wire-speed throughput for 256- or 512-byte frames.

The slow-path process can be run on a general-purpose CPU that also supports user interface, management plane and monitoring functionality, as well as applications such as data protocol stacks (e.g., Transmission Control Protocol/Internet Protocol (TCP/IP), a firewall, network address translation (NAT), IP security (IPSec) and voice protocol stacks (Session Initiation Protocol (SIP))). Through shared memory, the slow-path CPU controls and monitors the functionality of the fast-path CPU and other on-chip system resources.

Packet processing tasks that are common to broadband access gateway applications should be off-loaded to hardware-based packet acceleration engines that support specialized tasks at wire speed. Common tasks include packet parsing, classification, policing, shaping, queuing, encryption/decryption and hashing. Conversely, the implementation and execution of less common tasks will vary from application to application, or even from packet to packet. For these tasks, flexibility is required; and as a result, these tasks shouldn't be implemented in hardware, but in software running on the fast-path CPU. This approach results in a high-performance architecture that is both balanced and flexible. Balance is achieved by off-loading common tasks from the fast-path CPU to dedicated packet acceleration engines, thereby enhancing the performance of the fast-path CPU. The programmable nature of the fast-path CPU provides flexibility.

One common packet processing task that is often overlooked by communications processor designers is voice over Internet protocol (VoIP) packet processing. VoIP processing, which is required by most, if not all, broadband access gateway applications, is often supported by “soft” codecs running on the fast-path processor. Soft codecs can be “CPU cycle-hungry,” consuming up to 30 million instructions per second (MIPS) per codec. Alternatively, VoIP processing can be offloaded from the fast-path CPU to an on-chip digital signal processor (DSP), thereby freeing up cycles on the fast-path processor. This design approach not only increases the throughput performance of the fast-path processor, but also can reduce the degradation of voice quality under high-data traffic loads.

“Green” CPE

As energy costs have risen and the power consumption of broadband access equipment has increased with higher broadband access speeds, government regulatory bodies and service providers have recognized and begun to address the impact of the cost of the energy required to power this equipment to the consumer and to society as a whole.

Government regulatory bodies have introduced power reduction targets intended to reduce the power consumption of broadband CPE devices over a period of time. One example of such a set of targets is the European Union's Code of Conduct on Energy Consumption of Broadband Equipment. While initially targeted at DSL and other types of modem, router, hub/switch and access point CPE, the scope of this document will be expanded to include home gateways in the near future. In addition to specifying specific targets and a timeline for power reduction, this document dictates that the "... hardware has power management built in, where applicable, i.e. depending on the functionality required of the unit, the hardware will automatically switch to the state with the lowest possible power consumption."1

Broadband service providers have recognized the cost savings that they and their subscribers can achieve through the deployment of lower power CPE, and as a result, have launched campaigns to reduce power consumption. Verizon, for instance, recently established energy consumption requirements for broadband access equipment and CPE purchased after January 1, 2009. They are documented in Verizon's technical purchasing requirements: Verizon NEBS™ Compliance: Energy Efficiency Requirements for Telecommunications Equipment, VZ.TPR.9205, Issue 3, September 2008. Other service providers, such as FT Orange Business Services and NTT, have launched campaigns Green IT and Total Power Revolution, respectively, to raise awareness of and to reduce power consumption.

In response to these initiatives, communications processors must be architected for low power consumption in normal operating mode and to support a lower power state that will dramatically reduce power consumption when the processor is idle. These features can be realized though careful design (e.g., by enabling the user to power down unused functional blocks and reduce clock rates when the processor is in a low-usage state, by reducing the silicon processing geometry, and by integrating peripheral devices so as to minimize the power required to drive signals between chips).

Integrate to Reduce BOM Cost

Service providers and, in turn, their equipment vendors, demand continual reductions in the BOM cost of broadband access gateways. These demands drive chip vendors to reduce the cost of their products though integration and other means.

The integration of peripheral devices, such as DSPs, security processors and Ethernet switches, into a single system-on-chip (SOC) reduces the number of components, traces and layers on the printed circuit board (PCB) for any given broadband access gateway design. This reduction in components and board complexity not only results in lower materials costs, but also in reduced engineering, manufacturing and operational costs. Therefore, equipment costs are driven down, enabling equipment vendors to supply broadband access gateways to service providers at a lower price. Additionally, service providers and equipment vendors can realize revenues sooner due to an improved time-to-market because of an overall reduction in the complexity of the design and manufacturing process. The resulting cost savings and earlier recognition of revenue improves return on investment (ROI) for service providers, equipment vendors and chip vendors alike.

Setting the Benchmark

As service providers fulfill the surging demand for triple-play services to the home and secure, high-bandwidth data connectivity for small businesses and branch offices, they are demanding increased performance and functionality for the CPE deployed at these locations. These demands include gigabit-rate routing throughput for small packets, compliance with government regulations and service provider requirements for “Green” CPE, and a reduced BOM cost. Gigabit communications processors must satisfy these demands for a broad range of applications, including residential access gateways and small- to medium-sized business (SMB) secure routers.

These demands can be met by a communications processor that is architected with dual CPU cores and configured for asymmetrical multi-processing, with one CPU configured as the fast-path processor and the other configured as a slow-path processor. Multi-issue CPUs with appropriately sized L1 caches and dedicated packet acceleration engines are required to achieve wire-speed throughput performance for small packets while keeping power consumption in line with “Green” CPE initiatives. An integrated DSP guarantees carrier-grade voice quality even when the communications processor is subjected to a high-data traffic load while freeing up CPU cycles for packet processing and other applications.

The integration of these architectural elements with a judicious variety and quantity of broadband interfaces, such as Gigabit Ethernet, Universal Serial Bus (USB) 2.0 and Peripheral Component Interconnect (PCI), maximizes flexibility, reduces time-to-market and, most importantly, minimizes BOM cost. The addition of an integrated DDR2/3 memory controller can further reduce BOM cost by enabling the use of low-cost, high-performance DDR2/3 memory. Reduced Gigabit Media Independent Interface (RGMII) and serial interface options, including PCI Express and serial Flash (Serial Peripheral Interface (SPI)), facilitate reduced pin-count packaging options and a simplified board design.

While other less integrated solutions struggle to achieve wire-speed throughput and may degrade performance when managing multiple voice channels and/or customer applications in parallel, an architecture such as this maximizes packet processing throughput for all broadband access gateway service mixes and applications. With best-in-class performance, market-leading power consumption and ground-breaking BOM cost points, this architecture enables systems and service providers to successfully and profitably bring next-generation CPE products to end users. The level of performance supported by this architecture not only exceeds the requirements of the most demanding service providers today, but will support plenty of margin for the future.

About the Author

William McDonald currently serves as director of CPE and optical product marketing for the TranSwitch Corporation located in Shelton, Connecticut, and is responsible for its communications processor (Atlanta) and Ethernet passive optical network (EPON) (Mustang) product families. He brings to TranSwitch more than 20 years of experience in the telecommunications industry with a focus on broadband access systems and semiconductors. Mr. McDonald received his bachelor's degree in electrical engineering and computer science from UC Berkeley and his master's degree in electrical engineering from Cornell University. You can reach William McDonald at 510-771-3412 or william.mcdonald@transwitch.com.

Resources

1 Code of Conduct on Energy Consumption of Broadband Equipment Version 2 – 17 July 2007, page #6, EUROPEAN COMMISSION DIRECTORATE-GENERAL JRC, JOINT RESEARCH CENTRE Institute for the Environment and Sustainability, Renewable Energies Unit.

Back to Articles Home

Advertisements
Siliconaire
Chartered Semiconductor
Forum Home | Articles | Semiconductor Member News | Foundry Focus | Back-End Alley | Supply Chain Chronicles | Industry Reflections
Global Trends & Insights | Private Showing | Innovator Spotlight | Forum Archives | GSA Home