In ATM class, 500 lbs gorilla beats the hell out of the 800 lbs heavy but overweight old gorilla. Too bad. Just look at all the rounds below (summarized for quick view).
ROUND 1 Call Completion Rates Fore's ASX-1000 features a new controller module built around a 166-MHz Pentium chip that lets a single core switch set up and tear down 382 calls per second (see Figure 1 ). That's with eight OC3 ports handling bidirectional traffic. Cis co's Lightstream 1010 maxed out at 104 completed calls per second; IBM's 8265, at 99 calls. Fore actually hit 384 calls per second with two pairs of ports fielding unidirectional traffic. ROUND 2 (7 msec vs 199 msec, too bad the chubby gorilla can't duck fast enough) In the test with two table entries, Fore's ASX-1000 registered the lowest reconvergence time, setting up a new path in just 7 milliseconds. IBM's 8265 needed 180 ms; Cisco's Lightstream 1010, 199 ms (see Figure 2 ). In the test with three t able entries, the ASX-1000 set up a new link in 11 ms, compared with 159 ms for IBM and 283 ms for Cisco.
ROUND 3 (500 lbs gorilla throwing quick punches) On switches with per-VCC queuing, latency was lower—by up to three orders of magnitude. The average delay for Fore's ASX-1000 was just 20.1 microseconds, one-thousandth that of switches without per-VCC queuing. IBM's 8265 wasn't far behind, with an average latency of 28.5 microseconds. And Cisco's Lightstream 1010 averaged 76.3 microseconds.
ROUND 4 TKO, too much for the chubby 800 pounder
BPX/Axis will even come out much much more worse as most of the features tested are not supported such as MPOA and fast SVC setup. ---------------------------- Our exhaustive ATM evaluation finds three switches that boast supersonic signaling, robust reconvergence, and high-IQ queuing mechanisms
ATM's critics have always argued that the technology is too smart for its own good. Net managers who've gone glassy-eyed at one-too-many ATM se minars are likely to agree. They want to build networks, not delve into deconstructionist theory—and gigabit Ethernet switch vendors are all too happy to oblige. Vendors of frame-based boxes argue that fat pipes and low prices are the ultimate no-brainer, and their simple strategy is helping them grab a share of the campus LAN market.
TOP PERFORMERS
So what do ATM switch vendors decide to do? Outsmart the competition (what else?). And judging by the results of our latest Data Comm lab test, they've done just that. Their boxes are an impressive combination of brains and brawn. They boast sophisticated queuing algorithms that protect priority traffic without invoking ATM service classes. And they reroute around failures more than 200 time s faster than their frame-based counterparts.
Our biggest disappointment, in fact, is that this high-IQ hardware is in such short supply. Only three vendors answered our call for products: Cisco Systems Inc. (San Jose, Calif.), Fore Systems Inc. (Warrendale, Pa.), and IBM. And only Cisco supplied both core (backbone) switches and edge devices. Still, our testing trio represents the lion's share of ATM sales. (Where were the other switch suppliers? Not ready with product or busy selling gigabit Ethernet.)
MORE INFO Lab Test Vendor Participants
This time out Data Comm and its testing partner European Network Laboratories (ENL, Paris) decided to pull out all the stops. We put our participants through an exhaustive evaluation that covered ATM signaling , reconvergence around failures, forwarding fairness and latency, and Lane (LAN emulation). We also tried to test MPOA (multiprotocol over ATM), but none of the vendors brought a working implementation.
All the boxes did a bang-up job (except for MPOA, of course). But Fore's ASX-1000 walked away with the Tester's Choice award. Its performance was nothing short of astonishing: It handled more ATM (asynchronous transfer mode) circuits and rerouted around failures faster than any switch (cell- or frame-based) we've tested. What's more, it was unerringly fair when dealing with congested circuits.
Get the Signal
Our first three sets of tests—signaling, rerouting, and forwarding fairness—involved only core switches; the Lane tests involved both core and edge switches.
We took signaling first because it's the bedrock mechanism of any ATM network (or any net, for that matter). On a network where hundreds or thousands of nodes use Lane, switches must be able to set up and tear down a sizable number of SVCs (switched virtual circuits). And all those connections add up to a lot of signaling. (For the record, SVCs connect Lane clients with the three types of Lane servers; with file, print, and application servers; and with one another [see "Rolling Out ATM QOS for Legacy LANs," March 21, 1998; data.com ]).
MORE INFO Test Methodology
We measured user-to-network interface version 3.1 (UNI 3.1) signaling on one, two, and four ATM switches with OC3 (155-Mbit/s) and OC12 (622-Mbit/s) interfaces (see " Test Methodology "). We set up calls between two, four, six, and eight ports in unidirectional, bidirectional, and fully meshed patterns.
All told, there were more than 40 permutations. What's more, new equipment from Netcom Systems Inc. (Chatsworth, Calif.) let us examine both call setup and tear down (prior tests only looked at setup).
Call setup and tear down are processor-intensive tasks. Early signaling software topped out at around 10 calls per second. As vendors moved to faster CPUs (most commonly, a 25-MHz Intel i960), call setup rates reached about 40 calls per second. But it's possible to go much higher still.
Figure 1: Call Completion Rates Fore's ASX-1000 features a new controller module built around a 166-MHz Pentium chip that lets a single core switch set up and tear down 382 calls per second (see Figure 1 ). That's with eight OC3 ports handling bidirectional traffic. Cis co's Lightstream 1010 maxed out at 104 completed calls per second; IBM's 8265, at 99 calls. Fore actually hit 384 calls per second with two pairs of ports fielding unidirectional traffic.
Double Dipping
When we moved to two switches, Fore set up and tore down 278 calls per second. Again, the configuration involved eight OC3 ports (four on each switch) communicating in a bidirectional pattern. That's far faster than Cisco's 83 calls or IBM's 99. And when fielding fully meshed traffic on eight OC3 ports, Fore churned through 440 calls per second.
Fore was even farther out in front when we moved to four switches, setting up and tearing down 455 calls per second with fully meshed traffic on four OC3 ports (one on each switch). Cisco couldn't complete the four-switch test because of a faulty module; IBM topped out at 162 calls.
It's worth noting that calling rates climb as switches are added. That proves two things: First, calls across multiple chassis leverage the processing pow er on each chassis, thus lightening the load on each switch. Second, the mechanism used to update switches about network topology is essentially transparent to call signaling.
Keeping Up to Date
That mechanism is PNNI (private network-to-network interface), one of the critical technologies that keep ATM networks up and running. PNNI is similar to TCP/IP routing protocols like OSPF (open shortest-path first): It updates ATM switches about route changes. If a switch or a link between switches fails (more commonly, if a server goes down and a backup is brought online), PNNI tells other switches to reroute traffic around the trouble.
PNNI serves two functions in virtually all ISP and corporate ATM backbone nets: It sets up SVCs between ATM switches and redirects traffic around failed links or switches. The only place PNNI isn't used is on large carrier networks, which use NNI (network-to-network interface) to move traffic between backbones.
BUILD YOUR OWN CUSTOM TABLE Table 1: Selected Vendors of ATM Switches Since PNNI is deployed on big nets, it must meet two key requirements. Configuration complexity shouldn't scale with the size of the network. And routes should reconverge around failures as fast as possible.
To simplify connection management and billing, administrators of corporate and ISP backbones typically use PVCs (permanent virtual circuits) on links to the edge of the ATM network. But PVCs severely limit net managers' ability to take advantage of PNNI's dynamic routing and rerouting.
Whenever a link changes, the net manager must manually redefine a new PVC for the route, supplying details about switch and port numbers for each link. Given that a large network may involve hundreds of potential poin ts of failure, the PVC approach doesn't scale well.
A better approach is the so-called soft PVC. Net managers simply supply a destination ATM address, and PNNI figures out where that address resides. Because there's no 1:1 correspondence between ATM addresses and physical switch and port numbers, soft PVCs remap routes whenever network conditions change. All three vendors in this test support soft PVCs.
Steer Clear
PNNI also must reroute around trouble in a hurry. To see just how fast, we connected OC3 ports on four backbone switches, deliberately broke links among them, and measured how long it would take for PNNI to move traffic onto the new paths.
In the first reconvergence test, we set up four core devices with one physical link between each. After verifying that traffic traveled over the shortest path between switches, we broke the physical link between two. We then measured the amount of time required to reroute traffic over the only remaining path. In this configuratio n, each switch's routing table contained two entries on how to get from one switch to any other switch.
We then set up the test bed so that there were three physical links joining each switch in a fully meshed pattern. This time, we broke all but one link between two switches. In this configuration, each switch's routing tables contained four possible paths to any other switch. More table entries means more effort for each switch to calculate the shortest path to other switches.
Figure 2: Rerouting Around Trouble In the test with two table entries, Fore's ASX-1000 registered the lowest reconvergence time, setting up a new path in just 7 milliseconds. IBM's 8265 needed 180 ms; Cisco's Lightstream 1010, 199 ms (see Figure 2 ). In the test with three t able entries, the ASX-1000 set up a new link in 11 ms, compared with 159 ms for IBM and 283 ms for Cisco. (We set failover timers to zero on all switches to eliminate delay between the time we broke the link and the time rerouting began.)
These numbers are extremely good news for ATM switch vendors. We recently ran the exact same test with Layer 3 switches that implement OSPF, with very different results. Reconvergence times ranged from 1.6 to 10 seconds (see "Multilayer Switches: In the Beginning," November 1997, data.com ). The PNNI results are more than 200 times better than the frame-based numbers. That clearly indicates that large enterprise and ISP networks stand to boost performance substantially by using PNNI as their backbone routing protocol.
Playing Fair
Another critical measure of any ATM switch is the fairness of its queuing mechanism. The need for queuing fairness isn't obvious at first; after al l, ATM's oft-touted QOS (quality of service) is supposed to give key apps guaranteed boundaries on bandwidth and latency.
But these guarantees are honored mostly in the breach. Today, most corporate ATM backbones use Lane 1.0, which can't distinguish among QOS types. It simply sends all traffic over UBR (unspecified bit-rate) circuits. Trouble is, when congestion occurs most switches simply dump UBR traffic into a FIFO (first-in, first-out) queue. If a high-priority video stream gets stuck behind low-priority Web traffic, too bad. Worse, the video may be dropped if the queue doesn't empty out fast enough. Thus, ATM switch vendors need to prove they can play fair when different streams of the same ATM traffic type contend for scarce bandwidth, even though each stream has radically different bandwidth and delay requirements.
All three switch vendors say the answer is per-VCC queuing, a technique that allocates a separate queue to each VCC (virtual channel connection) and then services the queue s using a weighted round-robin algorithm. The vendors say this approach fairly distributes cell loss among lightly and heavily loaded channels.
To see how well per-VCC queuing stands up to congestion, we loaded each of eight inbound OC3 ports with 7.488-Mbit/s streams of traffic and sent one stream of OC12 traffic to one inbound port. (We tested Cisco's switch with seven inbound ports, not eight, because of a configuration oversight on our part.) All this traffic was destined for a single outbound OC12 port, thus overloading the port and forcing the switch's queuing mechanism to decide what to drop—traffic from the heavily loaded port or the lightly loaded ones. In our tests, none of the switches employed usage parameter control (UPC), another means of policing ATM traffic.
Per-VCC Philosophies
Figu re 3: Forwarding Fairness With Evenly Distributed Streams What we found is that vendors have very different philosophies as to how per-VCC queuing should be implemented (see Figure 3 ). Cisco's approach is to drop around 12 percent of the OC12 traffic, thus ensuring that 100 percent of each of the 7.488-Mbit/s streams gets delivered. Fore and IBM, in contrast, drop traffic evenly from all the streams. Fore drops exactly 9 percent from all streams, regardless of bandwidth. IBM drops 9 percent from the OC12 stream and 5 percent or 6 percent among the smaller streams.
There's no one right approach. Fore argues that sometimes the bigger stream has the higher priority, and therefore loss should be distributed evenly across all streams—something it does exactly, down to the cell. Cisco's take on the topic is that smaller streams should always be protected from starvation. Which is better depends on the traffic patterns and design of eac h individual network.
This works best on networks where the highest priority is on low-bandwidth applications like Telnet.
Our first queuing test assumed exactly identical loads across multiple ports—a rare situation on real-world networks. We also evaluated queuing fairness by offering different loads to each inbound port, sending incrementally higher amounts of traffic.
Thus, stream 2 had twice as much traffic as stream 1; stream 3, twice as much as stream 2; and so on. As before, we also offered a background load consisting of a fully loaded OC12.
Figure 4: Forwarding Fairness With Incremental Streams Here again, Fore clipped all streams by exactly the same amount (down to the cell, in fact); Cisco dumped more traffic from the OC12 stream. But IBM's results change d: Instead of dropping traffic more or less evenly across all streams, it delivered only 65 percent of the OC12 stream and 95 percent of the other streams. It appears that IBM's queuing algorithm taxes all streams the same when loads are distributed evenly and taxes heavily used circuits most when loads are distributed incrementally.
Switching Slowdown?
We also measured latency to see what per-VCC queuing would do for delay-sensitive traffic. Past tests have shown that switches without this function exhibit unacceptably high latencies for voice and video. For example, an OC3 switch with a 10,000-cell FIFO queue has a latency equivalent to its queue depth multiplied by the time needed to transmit one cell (2.83 microseconds for OC3 over Sonet), which works out to more than 28 milliseconds. That's much longer than the delay requirements of some video apps.
On switches with per-VCC queuing, latency was lower—by up to three orders of magnitude. The average delay for Fore's ASX-1000 was just 20.1 microseconds, one-thousandth that of switches without per-VCC queuing. IBM's 8265 wasn't far behind, with an average latency of 28.5 microseconds. And Cisco's Lightstream 1010 averaged 76.3 microseconds. Fore earns bragging rights with the lowest latency, but for practical purposes any of these switches are well below the delay requirements of voice and video.
And those low numbers are a very big deal. They ensure that PC multimedia apps with no provision for setting up CBR (constant bit-rate) or VBR (variable bit-rate) circuits stand a very good chance of running over UBR—provided the switches offer per-VCC queuing, even when the network is heavily loaded. |