Some more interesting reading:
lanmag.com
Terabit Routers: A Lesson in Carrier-Class Confusion
Next-generation IP services demand next-generation routing. Will terabit routers fit the bill?
by David Greenfield
Carrier class: that's how seven router vendors describe their newest high-end creations. These new terabit packet blasters are supposed to enable ISPs, the carriers of the new millennium, to transform the Internet into the next- generation phone network - fault tolerant, adaptable, and perfectly capable of beaming high-end videoconferencing or broadcast television into the PCs across the enterprise. All of which should leave network managers, the recipients of these nifty network services, asking the question: Are these so-called carrier-class routers tough enough?
The answer is anything but a clear affirmative. Despite months of hype, product deliveries are still in the early stages, and even shipped products lack key features?all of which makes definitive assessments difficult. However, if the views of early adopters are any indication, terabit routers come up short on what carriers really want - the reliability and uptime common to telephone networks.
"Carrier-class is a term thrown around by a lot of people," says Jason Martin, director of technology at Williams Communications ( www.wilcom.com ), "but I don?t think [terabit routers] are there today." Martin is after routers that simply don't fail. The test? "Can you yank all cards out, knock machines down, and still maintain the network?" he asks.
That might be a bit extreme, but none of the vendors shipping produc - ?not Lucent, Avici Systems ( www.avici.com ), nor Cisco Systems - are delivering routers that can keep purring as critical cards are pulled. Nor can they handle updating the router code without affecting the router's operations. Tough standards? Absolutely. However, anything touted as a carrier-class product should meet carrier-class expectations - especially when it has a carrier-class price tag.
Consider this: Most of these boxes start at roughly $6,000 per OC-3 (155Mbit/sec) interfaces when carrying Packet over SONET (POS) and $26,000 for ATM traffic. But that?s not even the high end. Purchase an OC-192 (9.952Gbit/sec) interface, and the per-port price reaches over $200,000.
Does this mean network managers can forget about those next-generation videoconferencing services? Hardly. While these boxes might be expensive and lack software fault tolerance, the hardware availability is a huge improvement over existing gear. There's redundancy built into the boxes, something sorely lacking in most of today's routers. What's more, their scalability far outstrips existing routers, with chassis sporting higher port speeds and better throughput. Need more ports? Carrier-class routers can combine chassis to increase port count without taking a performance hit.
And then there's future-proofing. Carrier-class gear aims to ultimately integrate with surrounding optical transports. While none of today's carriers offer that, all have planted the seeds in their routers. Some even go a step further and deliver the necessary software interfaces to provide application-driven network provisioning. Now that?s really the next-generation phone network.
THE NEED FOR SPEED
The alarming growth of the Internet?s traffic rates has ISPs worried. While CPU speeds may double every 18 months, Internet bandwidth grows at four times that rate. More traffic means today?s routers need a huge performance boost. Terabit players claim to deliver just that. While an M40 gigabit router from Juniper Networks ( www.juniper.net ) peaks out at 20Gbits/sec, and the 12000 at 60Gbits/sec, the Pluris ( www.pluris.com ) 2000 scales up to 149Gbits/sec in one box and 19.2Tbits/sec across multiple boxes.
However, speed isn?t the only problem. Cisco Systems? Internetwork Operating System (IOS) routing code has its share of bugs, particularly in the newer releases of code. TeleDanmark ( www.teledanmark.dk ), the Danish incumbent administration, for example, held off on rolling out the Multiprotocol Label Switching (MPLS) capability offered in IOS version 12 for precisely this reason. While testing the code, network manager Jesper Skriver encountered several problems, including a memory leak on one of the line cards. The card reset itself, sometimes as often as every two hours, dropping packets for up to 20 seconds before switching over to a redundant path.
And it?s not just line card problems that bother Cisco routers. Skriver found bugs that hit the Route Switch Processor (RSP), for example, which can take down the entire router, preventing it from forwarding packets or calculating new routes. Installing redundant RSPs only made matters worse. The router ended up rebooting on the wrong RSP or hanging during updates between the two cards.
While Cisco has addressed some of these problems in the 12000 series, Skriver says the experience underscores his conviction that Cisco?s real strength doesn?t lie in the technology. ?They charge premium prices for yesterday?s products, but they can do that because they?ve got the best support in the industry,? says Skriver. Yet it?s precisely that high-end technology that?s so critical for enabling operators to stay competitive in delivering next-generation service. Translation? The high-end router market is wide open.
THE PLAYERS
Loads of players with the cutting edge technology are certainly willing to fill in that gap. The best way to start sifting through them is by looking at their clustering ability. With clustering, chassis are grouped together to form a single router. Route decisions are made once for the entire cluster so that carriers can reach terabits of throughput without incurring additional router hops. (There's more to carrier-class routing than just terabits of performance, however ... see Figure. )
Using that criteria, seven vendors stand out (see Table 1 ). Three are terabit start-ups: Avici Systems, the first vendor out with a terabit router; Pluris; and Charlotte?s Web Networks ( www.cwnt.com ). A fourth vendor, Ironbridge Networks, is developing a terabit router that?s expected to ship in the fourth quarter of 2000.
However, the terabit turf battles aren?t only going to be fought among newbie router providers. Cisco shipped a terabit router, the 12016, in January 2000. Lucent Technologies (formerly Nexabit) delivered its router in 1999, while Nortel Networks will ship the Versalar Switch Router this month.
The only exception is Everest from Tellab?s Internetworking Systems Division (formerly Netcore). The box may not reach terabit speeds, but it is orders of magnitude faster than existing gigabit core routers and has the reliability features network architects want.
Conservative buyers might be tempted to dismiss the competition for the market?s high-end as just hype. After all, Cisco holds over 80 percent of the router market. Only a vendor with tremendous muscle would be able to encroach on that terrain, or so the argument goes. ?There?s Cisco and ourselves,? says Mukesh Chatter, vice president and general manager of IP products at Lucent. ?The rest are just a sideshow.?
However, those quick assessments may miss the mark. ?We?ve got the gear, and I can assure you Avici is no sideshow,? says John Griebling, vice president of network engineering and operations at Enron Communications, a provider of IP-based services and a current Cisco user.
THE HOLDUP
Having the gear in hand is still pretty unusual. While router vendors have long talked about their new high-end routers, the reality is that product only recently started shipping. The problem is the silicon. Stabilizing the high-speed ASICs has proven a challenge for the industry, particularly as the standards being cast continue to change, forcing silicon revisions.
Solving those challenges means some fancy footwork, which is why, until recently, so few vendors have shipped terabit routers. Tellabs became the first vendor to deliver a multichassis router in 1999 by using Field Programmable Gate Arrays (FPGAs) and off-the-shelf silicon. The problem? The throughput of its Everest product is very limited in comparison to other multichassis gear. The problem is the FPGAs: they aren?t as scalable as ASICs, says Charlie Jenkins, vice president of sales and marketing at Solidum ( www.solidum.com ), a manufacturer of high-speed classification engines.
This is why Avici took a different approach. While many companies farm out the back end of Real Time Logic (RTL) code development for their ASICs, Avici says it has kept development in-house, enabling the company to make modifications later in the ASIC development cycle. ?I won?t pretend that we didn?t have bugs in our ASIC,? says Peter Chadwick, vice president of product management at Avici, ?but with RTL development in-house, we could find them quickly.?
Even when vendors do have products shipping, key options may not be available, which makes it difficult to get an accurate picture of what?s actually deliverable. For example, the external switching gear that enables Cisco and Tellabs to cluster their chassis isn?t yet available, despite the router shipments.
Interfaces are a whole other matter. While vendors might talk about an OC-192 (9.952Gbit/sec) interface, just try ordering one. ?Lucent has an OC-192 card, and it works. That?s unusual,? says Scott Beudoin, senior technologist in data services at Williams Communications. ?Most vendors say they have OC-192, but they don?t.?
HARDWARE, HARD FACTS
So just what level of redundancy and reliability do these products offer network architects? To hear the rhetoric, these routers sound like they?re ready to deliver nonstop IP services today.
?You can pull any board out and the machine [the 64000] will continue to operate without interruption,? says Lucent?s Chatter. Meanwhile, Cisco claims the 12016 offers ?carrier-class reliability? and provides rapid and complete recovery from switch-fabric, line card, fabric, and power supply failures.
However, the devil?s in the details, and here?s where network architects need to examine the hardware and software fault tolerance (see Table 2 ). On one hand, the terabit router greatly improves hardware reliability. Concerning the basics, all terabit routers ship with redundant blowers and power supplies and are Network Equipment Building Systems (NEBS)-compliant. NEBS is a Bellcore (now Telecordia) specification that?s become the de facto standard for ensuring that carrier equipment meets safety, functionality, and interoperability levels. This covers things like earthquake and office vibration resistance.
Now move up to the actual routing components. While gigabit routers like Juniper?s M40 and Cisco?s 7200 offer no redundancy in the routing engine subsystem, that?s not the case with terabit routers. Lucent?s 64000 and Cisco?s 12016, for example, can be configured with redundant I/O modules, switch-fabric boards, and route control processors. Avici says switching is distributed across I/O modules. Lose a module, and traffic is switched by the other modules. When Pluris ships its 2000, each of the product?s I/O modules will be wired to two switch modules. With a total of 16 switch modules, the 2000 can lose half of its switching fabric before a failure will take out a link.
What?s more, since terabit routers separate route calculation and I/O processing on different modules, a failure to one won?t necessarily affect the other. For example, pull the route control processor, and the 64000 will continue to forward packets, though it may not be able to implement routing updates. That?s certainly not the case with the 7000, as Skriver can attest.
And there?s the rub. Except for Lucent, none of the vendors shipping gear claim to be able to continue adding or changing routes when a route processing engine is pulled. As for Lucent?s claim, Beudoin doesn?t buy it. ?No one I?ve run across can claim to have redundant hot concurrent parallel processing router engines,? he says.
What they can provide is automatic switchover to a backup route processing engine. This requires either a reboot of the entire system (as with the Everest) or, in the best of cases, network architects can restart the processor. Either way, expect up to a minute of downtime?and that?s just not fast enough. ?Providers want 45 milliseconds? switchover,? says Chadwick. ?When there are 100 OC-192s coming through a box?that?s a lot of data to turn off.?
The challenge has to do with the Border Gateway Protocol (BGP), the Internet protocol used for communicating route changes. BGP sessions run over TCP and as such have a lot of ?state? associated with them, says Chadwick. Knowing the exact state wherein a second processor can take over hasn?t been accomplished by any of the vendors, says Beudoin.
The vendors are certainly working on it, though. Tellabs says it will offer automatic switchover to a backup route processor in the 1.3 code release, due in June 2000, or the 1.4 code release (the Everest currently runs 1.2). The vendor will have two management cards running in parallel, where the primary card is mirrored onto the secondary cards. The interfaces appear logically as one, so they?re both able to maintain an accurate state of the BGP session. Should the primary card fail, the second will take over. Pluris expects to offer the same feature in its 20000 later in 2000.
Finally, there?s cluster expansion. Reaching more ports is one thing, but doing that without affecting the operation of the existing router is something else. Neither Lucent nor Cisco can grow the routing cluster without affecting the operation of the installed router. Other vendors claim to offer chassis insertion. For example, Nortel says that growing the 25000 is a matter of hot-inserting interfaces into each router and then tying them together through the Optera Packet Core product. The routers automatically identify that they?ve become part of a cluster and adjust accordingly.
SOFTWARE RELIABILITY
Router software availability is yet another issue. Even Tellabs and Pluris won?t be able to recover from a software bug. The problem is simple: Because both processors are identical?that is, running identical code using identical data?a bug that crashes one processor will also crash the other. In its defense, Pluris argues that those expectations might be unreasonably high; fault-tolerant systems have never been able to crack this problem, so there?s no reason to expect any more from router vendors.
Then there?s the issue of in-service software upgrades. Getting to zero downtime means being able to upgrade routing code without affecting router operation. With in-service upgrades, new versions of the routing code are brought online without dropping packets.
Today, the Juniper M40 alone among the gigabit routers has that ability. This is because the M40 runs on top of Unix, which enables new features to be added as the machine is running. According to John Stewart, network engineer at Juniper, the M40 permits in-service upgrades of some drivers, the SNMP functionality, and the routing protocols.
Among the terabit pack, Lucent, Charlotte?s Web, and Nortel Networks come closest to this functionality. Lucent claims it can handle limited in-service upgrades by letting users add a feature, like a new protocol, without taking down the router. The vendor claims that it can also upgrade BGP and enhance SNMP functionality without affecting router operation.
Protected mode memory is another matter altogether. With the M40 running on Unix, Juniper can isolate processes from each other. This way, a corrupted memory pointer, for example, won?t crash the entire set of code running on that processor.
Cisco is said to be adding a similar feature to its IOS Extended Network Architecture (ENA). Lucent claims to use protected mode memory. Pluris expects to add protected mode memory shortly but contends that, due to its fault-tolerant architecture, the move isn?t critical, as it can reboot the processor without dropping packets.
FUNNY NUMBERS
The scalability story is only slightly better than the reliability pitch. While these vendors claim performance that is orders of magnitude better than existing enterprise devices, getting a fix on exact port counts is another matter. Vendors play three different types of shell games to boost their performance claims:
Game 1: Measuring box performance by the speed of the internal architecture. Lucent claims to have 6.4Tbits/sec of throughput, but that?s the internal speed of the box. Lucent?s Chatter argues that looking at the internal capacity of the box is key to getting a sense of its scalability. Skeptics may have another read on the matter: Inflating performance numbers is relatively easy to do. ?I think this is a meaningless number,? says Avici?s Chadwick. Sore words from a competitor? Perhaps. Then again, Avici has nothing to lose by going with internal numbers. Chadwick says Avici?s bus architecture runs around 32Tbits/sec.
Game 2: Tracking the packets. Counting packets can be a valuable measure, giving an indication of the actual forwarding capacity of the box. Of course, this assumes vendors count packets the same way, which is hardly ever the case.
Critics claim Cisco double-counts the packets: once as they enter the chassis and once as they leave. Other vendors only count inbound packets. Then there?s the issue of packet sizes. Vendors can count with minimum-size packets. Others use longer packet lengths, placing less of a strain on the router.
Game 3: Counting performance by cluster size. Unlike enterprise boxes, all of the carrier-class routers can be grouped together to form a single, logical box. Some vendors, like Pluris, cite the capacity for the entire cluster as the capacity for the system, which in the case of the 20000 series would run 184Tbits/sec.
The best approach? Measure routers by the aggregate I/O capacity of a single chassis. At the end of the day, I/O is the only thing that can be bought, not system performance. The math here is easy: just multiply the interfaces by their line rate. Using this approach, it?s clear that none of the routers can actually handle terabits of data. Lucent, for example, tops out at 159Gbits/sec, Pluris and Cisco at 149Gbits/sec.
Think of it this way: Except for Tellabs, the vendors more than quintuple the eight OC-48 ports supported on a Juniper M40. Avici actually claims to reach 40 OC-48s in a single box, or nearly 560 OC-48s per cluster. Tellabs? use of FPGAs limits the box to just four OC-48s per chassis, forcing it to rely on clustering to reach up to 256 OC-48s.
CLUSTERING MUSCLE
Out in the real world, port counts don?t scale as neatly as they do when simply adding the maximum number of ports per chassis together. Reaching the maximum cluster size or level of reliability typically means cutting into port counts.
Start with clustering. Charlotte?s Web Networks, Lucent, and Nortel use interface ports to cluster their routers, reducing the port density on the box. Charlotte?s Web Networks? Aranea, for example, clusters 32 chassis using special modules that consume up to 25 percent of the box?s interfaces. The other vendors claim to scale using their switching fabric rather than interfaces.
Port counts also need to be looked at in the context of the space in the Point of Presence (POP). With space at a premium, operators aren?t concerned only with getting high port densities per chassis; they also want high port densities for the seven-foot racks that house the gear. Some of that is obvious. With 159Gbits/sec of throughput, one 64000 will match over 15 Everest boxes.
However, there are some less obvious factors, like chassis width. Here again the FPGA design hurts the Everest. The Everest?s chassis is 23-inches wide instead of the usual 19 inches, says Joe Durkin, senior product manager at Tellabs. The problem? Some installations only have racks 19 inches wide.
Then consider the impact resilience will have on the port counts. Cisco?s 12016, for example, can be equipped with redundant route processor cards, but that means burning a slot for I/O ports. Tellabs has a similar problem. The Everest sports four interface cards for I/O processing. Each interface card handles traffic coming in from four line cards. To gain redundancy in the I/O processing modules, network architects can designate an interface card as a backup, but in doing so they can?t utilize the four I/O processing modules. ?In practice, very few customers take advantage of the redundancy for that very reason,? says Durkin.
Finally, check out the distances between nodes in a cluster. Some vendors rely on Synchronous Digital Hierarchy (SDH) to extend the distance between clustered nodes. Tellabs, for example, can hit 26 kilometers between nodes. This enables operators to ensure greater resilience by locating nodes on different floors or in different buildings. Other vendors operate under much tighter distance constraints. Avici requires nodes to be connected directly together.
FUTURES
Looking forward, terabit router vendors are aiming to integrate their devices with other network elements. At the bottom layer, that means hooking into optical devices. The idea is that the terabit router and the optical switch will communicate using MPLS, supported by all the routers in the review. The synthesis will let operators do things like automatically provision the network based on layer-3 intelligence.
At the high end, it means giving network control to software; end-user applications will be enabled to request required network services. So a videoconferencing package, for example, could request bandwidth to be reserved for a videoconferencing session.
Key to this function is the ability to rapidly respond to such automated commands. That is a major reason why Enron Communications? Griebling is looking at Avici over Cisco?s 12016. ?The programmatic access in the 12016 is rudimentary,? he says. ?It?s a matter of comparing the sub-five-minute response times of the 12016 to the subsecond response times of the Avici.? With performance differences like that, it?s easy to see why the next-generation phone network is still a wide-open market.
David Greenfield, international technology editor, can be reached at dgreenfi@cmp.com . |