WJ., Perhaps this is way off topic as to who would want to embed FC fabrics, but maybe it is foreshadowing.
From the From here to Infiniband thread.
Section: Embedded Systems -- Focus: Networking and Datacom -------------------------------------------------------------------------------- Edge-switch intelligence sharpened Bill Weir, Vice President of Marketing, Power X Networks Inc., San Jose, Calif.
Last year, for the first time, more data than voice was sent through U.S. networks. It's no secret that packet switching is far exceeding circuit switching, and carriers are gobbling up packet-switching products. To this demand add the explosion of technologies such as DSL, and you can see that switching is the fabric of our business lives.
Switching was a hit on the local network level in the mid-'90s, thanks to low-cost manufacturing of switches based on shared-memory architecture. The largest such units could handle traffic for an enterprise, or even a campus. Problems arose, however, when bandwidths ran past 20 Gbits/s (the product scalability wasn't there. Scalability meant adding more boxes.
Shared-memory switches are still used as access switches into larger networks, but have fairly high latency, are cost-intensive, and require massive amounts of storage space. In addition to RAM needs and power consumption, these switches were burdened with policy overhead, and packet handling.
Because shared-memory switches are based on one large central memory repository, switch bandwidth is intimately woven with memory bandwidth. To increase switch bandwidth, you increase memory, which means you increase pins and consequently increase power dissipation. If the memory is increased to handle large-scale data streams, that means the vast majority of normal Internet Protocol traffic, driven by burgeoning Internet use, will not use the additional memory bandwidth. Product and chip-set designers using this methodology will soon become familiar with the law of diminishing returns; much of the additional memory bandwidth sits unused. Adding pins takes its toll in terms of power, space, and cost. In a nutshell, shared memory is a self-limiting architecture.
As demands were placed on the telephony and datacomm resources of core networks, it became obvious that a crossbar architecture was more appropriate. Unlike shared-memory switches, crossbar switches distribute the memory in smaller amounts near the network processor. Rather than a CPU controlling a large amount of shared memory, the crossbar switch uses an arbitration scheme to control traffic moving into and past the crossbar. In contrast to the shared-memory switch, the crossbar switch has multiple bandwidth paths and a very low pin count due to its highly integrated nature and its smaller, distributed memory units.
Most core switches are of the crossbar variety. These switches excel at moving large amounts of data from point to point and are very reliable. However, they lack intelligence. Further, these switches are designed with a multiple-interconnect crossbar. Data can take many paths through the switch. This is an upside in terms of aggregate bandwidth because heavy traffic can get through the switch in any number of ways.
The very strength of the multiple-interconnect architecture is also its greatest downfall. Consider what that architecture means from a quality-of-service (QoS) point of view. QoS is nearly impossible. It is particularly interesting to the service provider who regularly signs service-level agreements with customers. Why? It is impossible to guarantee service when two successive packets take two different paths through a switch. In some cases, packet 2 may reach output before packet 1, which means buffering and reordering, thus introducing latency.
Access switches and core switches perform such different tasks and have such different requirements, it's easy to see why they have evolved into distinct technology sets. Yet one of the most interesting aspects (and one of the most problematic) is the area in between access and core. You might call this edge switching. This middle area requires speeds between roughly 30 Gbits/s to over 1 Terabit/s. What would a manufacturer of this type of device want? Certainly the switch would need to be intelligent, because it must do more than just pass groups of packets. Complete redundancy is a must, speed is critical and scalability must be designed into the solution.
We decided to look at the problems of edge switching from a slightly different perspective. For starters, we chose to build a design utilizing some unique architectural differences: a commitment to single-stage switch fabric, and an innovative use of embedded physical-layer links (PHYs), which use a synchronous, asymmetrical backplane technology rather than the standard asynchronous, symmetrical technology that was propagated from discrete "line-side" applications.
Manufacturers moved into designs that incorporated CMOS technology a few years ago because of the promise of low power dissipation and high levels of silicon integration. In fact, so much attention was paid to integrating digital functions onto one IC that little attention was paid to the growing need for multichannel analog PHY links, particularly as it applies to the backplane.
Most manufacturers redesigned serial links into CMOS for backplane applications by translating existing symmetric, single-channel, wire-side link designs. Symmetric, wire-side data links on both sides of the link required separate phase-locked loops, additional clocks, and attention to equal-length printed-circuit board traces to ensure proper data alignment. Having to restrict pc-board traces is a real issue when laying out the crowded backplane.
In effect, it was a traditional, single-channel PHY design overlaid on embedded, adjacent, multichannel backplane designs. This approach, we felt, had some serious problems. First, there is the issue of injection locking due to cross-channel phase-locked-loop (PLL) interaction. Also, cross-channel data-line interaction caused crosstalk, and at high speed these switch designs exhibited bit-error-rate degradation. More mundanely, designs added circuitry and complexity and restricted board trace routing. Ironically, by replicating the CMOS-based technology many times on one chip-set, injection locking was almost assured in any multiple port design.
We tackled these PHY issues by moving from a traditional asynchronous, symmetric link design to a synchronous, asymmetric design for the embedded PHY backplane application. The technology we decided to employ, called synchronous transmit and receive, incorporates a synchronous PHY backplane. By using a synchronous backplane approach, the circuit implementation on the fabric chip (which contains the highest number of serial links in the chip set) is simplified and aggregate power dissipation is decreased. Data is locked on to a single common frequency, but crosses many channels.
We also used a master/slave approach, with a single clock generator spanning all channels. One side of the link does not require high-speed PLLs, reducing system circuitry, eliminating injection locking, crosstalk, power dissipation problems, and pc- board trace restrictions. Up to 64 PHY serial link channels can be put on one integrated circuit. This set of methodologies will allow the design of systems that may rise to hundreds of PHYs in the near future. The technology has been validated at 1 Gbaud, and the next generation of this design will boost speeds to 3.125 Gbaud. The asymmetric serial link technology is utilized for high-speed links within an ASSP switch-fabric chip set for data and telecom applications.
As the port count scales up in other switches, the use of traditional line-side PHY integration becomes much more difficult when applied to the backplane, especially in trying to integrate these designs. The Power X embedded PHY architecture obviates the many problems encountered when using line-side PHYs in the backplane. More importantly, the architecture allows successful integration of the PHY function into the chip set. The approach we took solves several problems relating to intelligence, scalability, throughput and power, space and cost issues.
Priority levels
Edge switches must be intelligent if they are to perform under QoS pressures. We employ an arbitration that allows 16 levels of traffic prioritization. It also incorporates bandwidth allocation from any input port to any output port, bringing policy-based networking that much closer to the core network. Because we use a single-stage switch fabric architecture, end-to-end flow control is possible.
Ever-growing bandwidth requirements have forced manufacturers to offer very high port counts in small-footprint switches. These switches must scale well to compete on a life-cycle basis. Crossbar switches are scalable by nature, and with the addition of highly integrated circuitry and chip sets, no pin limit is reached, and no revision sync is encountered. The technique we use allows switches to scale from 20 Gbits to more than 1 Tbit and beyond, and maintain compatibility between successive generations of chip sets. To scale the switch, one can increase the serial links within the chip set and increase the number of ports, which increases bandwidth without the downsides encountered in shared-memory switches or in other less robust crossbar designs.
The speed of the arbitration algorithm adds to the throughput capabilities. In addition, the switching fabric is non-blocking under all loads and does not add latency. The non-blocking fabric also simplifies the backplane design and increases port bandwidth.
Power dissipation is kept to a minimum because the system does not rely on central memory and an increasing number of pins that rob other systems of efficiency in power and space. Also helping are embedded PHY serial links designed to reduce power. Cost factors might be renamed "flexibility factors" (programmable features allow IT managers to make the silicon respond differently by making changes in software. Systems based on Power X technology can adapt to changing conditions without requiring replacement.
Resilient crossbars
Next-generation switches will continue to rely on crossbar architecture to meet these challenges. Higher speeds are a certainty, and edge routing will see speeds in the multiple terabit range soon. Indeed, the strength in network switching may have ramifications at the enterprise level.
What is an edge switch today will become the enterprise switch of tomorrow. Indeed, as networks become speedier, existing shared-bus architectures, such as PCI, are coming into question and in fact represent the system bottleneck in some cases.
Infiniband is a new standard based on switched serial links to device groups, with single-link speeds operating at 2.5 Gbaud point-to-point in a single direction. Devices and device groups that comply with the standard can be connected to several hosts via a switching fabric. The ability to manage traffic will have important ramifications in this market, as will the ability to manage the entire switch fabric. The solutions that will be integrated seamlessly into Infiniband will be those that incorporate flow control and the low latency capabilities that it requires.
After reading this article I had the start of understanding as to why the LU, CSCO, NT, and similar companies may be involded with I band. Will QLGC come to be a major supplier of Iband fabric to folks that want to play in this space?
PS- I seem to remember that ANCR has never had a shared memory architecture, but BRCD did. This article helps explain why they have ( or will ) move from that design. Do I have this right?
Who will be the first to sign up for Qs end to end solutions? Compaq, Dell, Fujitsu, Toshiba, SUNW, IBM? My own thoughts are that those with the least in house expertise and those slowest to market, and who need to catch up may be the first. Someone like Toshiba. |