SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : MRV Communications (MRVC) opinions? -- Ignore unavailable to you. Want to Upgrade?


To: signist who wrote (19960)4/3/2000 2:50:00 AM
From: signist  Read Replies (1) | Respond to of 42804
 
A Lesson in Carrier-Class Confusion <Charlotte's Web>

Chart of "The Players"

img.cmpnet.com

MARCH 2000 CONTENTS > NPN: NEW PUBLIC NETWORK >

networkmagazine.com

"CLUSTERING MUSCLE

Out in the real world, port counts don?t
scale as neatly as they do when simply
adding the maximum number of ports per
chassis together. Reaching the maximum
cluster size or level of reliability typically
means cutting into port counts.

Start with clustering. Charlotte?s Web
Networks, Lucent, and Nortel use interface
ports to cluster their routers, reducing the
port density on the box. Charlotte?s Web
Networks? Aranea, for example, clusters 32
chassis using special modules that consume
up to 25 percent of the box?s interfaces. The
other vendors claim to scale using their
switching fabric rather than interfaces.

Port counts also need to be looked at in the
context of the space in the Point of Presence
(POP). With space at a premium, operators
aren?t concerned only with getting high port
densities per chassis; they also want high
port densities for the seven-foot racks that
house the gear. Some of that is obvious.
With 159Gbits/sec of throughput, one 64000
will match over 15 Everest boxes. "
................................................................................

Terabit Routers: A
Lesson in Carrier-Class
Confusion

Next-generation IP services
demand next-generation routing.
Will terabit routers fit the bill?
by David Greenfield
Carrier class: that?s how seven router
vendors describe their newest high-end
creations. These new terabit packet blasters
are supposed to enable ISPs, the carriers of
the new millennium, to transform the
Internet into the next- generation phone
network?fault tolerant, adaptable, and
perfectly capable of beaming high-end
videoconferencing or broadcast television
into the PCs across the enterprise. All of
which should leave network managers, the
recipients of these nifty network services,
asking the question: Are these so-called
carrier-class routers tough enough?

The answer is anything but a clear
affirmative. Despite months of hype, product
deliveries are still in the early stages, and
even shipped products lack key features?all
of which makes definitive assessments
difficult. However, if the views of early
adopters are any indication, terabit routers
come up short on what carriers really
want?the reliability and uptime common to
telephone networks.

?Carrier-class is a term thrown around by a
lot of people,? says Jason Martin, director of
technology at Williams Communications (
www.wilcom.com ), ?but I don?t think
[terabit routers] are there today.? Martin is
after routers that simply don?t fail. The test?
?Can you yank all cards out, knock
machines down, and still maintain the
network?? he asks.

That might be a bit extreme, but none of the
vendors shipping product?not Lucent, Avici
Systems ( www.avici.com ), nor Cisco
Systems?are delivering routers that can
keep purring as critical cards are pulled. Nor
can they handle updating the router code
without affecting the router?s operations.
Tough standards? Absolutely. However,
anything touted as a carrier-class product
should meet carrier-class
expectations?especially when it has a
carrier-class price tag.

Consider this: Most of these boxes start at
roughly $6,000 per OC-3 (155Mbit/sec)
interfaces when carrying Packet over
SONET (POS) and $26,000 for ATM
traffic. But that?s not even the high end.
Purchase an OC-192 (9.952Gbit/sec)
interface, and the per-port price reaches
over $200,000.

Does this mean network managers can
forget about those next-generation
videoconferencing services? Hardly. While
these boxes might be expensive and lack
software fault tolerance, the hardware
availability is a huge improvement over
existing gear. There?s redundancy built into
the boxes, something sorely lacking in most
of today?s routers. What?s more, their
scalability far outstrips existing routers, with
chassis sporting higher port speeds and
better throughput. Need more ports?
Carrier-class routers can combine chassis to
increase port count without taking a
performance hit.

And then there?s future-proofing.
Carrier-class gear aims to ultimately
integrate with surrounding optical transports.
While none of today?s carriers offer that, all
have planted the seeds in their routers. Some
even go a step further and deliver the
necessary software interfaces to provide
application-driven network provisioning.
Now that?s really the next-generation phone
network.

THE NEED FOR SPEED

The alarming growth of the Internet?s traffic
rates has ISPs worried. While CPU speeds
may double every 18 months, Internet
bandwidth grows at four times that rate.
More traffic means today?s routers need a
huge performance boost. Terabit players
claim to deliver just that. While an M40
gigabit router from Juniper Networks (
www.juniper.net ) peaks out at 20Gbits/sec,
and the 12000 at 60Gbits/sec, the Pluris (
www.pluris.com ) 2000 scales up to
149Gbits/sec in one box and 19.2Tbits/sec
across multiple boxes.

However, speed isn?t the only problem.
Cisco Systems? Internetwork Operating
System (IOS) routing code has its share of
bugs, particularly in the newer releases of
code. TeleDanmark ( www.teledanmark.dk
), the Danish incumbent administration, for
example, held off on rolling out the
Multiprotocol Label Switching (MPLS)
capability offered in IOS version 12 for
precisely this reason. While testing the code,
network manager Jesper Skriver
encountered several problems, including a
memory leak on one of the line cards. The
card reset itself, sometimes as often as every
two hours, dropping packets for up to 20
seconds before switching over to a
redundant path.

And it?s not just line card problems that
bother Cisco routers. Skriver found bugs that
hit the Route Switch Processor (RSP), for
example, which can take down the entire
router, preventing it from forwarding
packets or calculating new routes. Installing
redundant RSPs only made matters worse.
The router ended up rebooting on the wrong
RSP or hanging during updates between the
two cards.

While Cisco has addressed some of these
problems in the 12000 series, Skriver says
the experience underscores his conviction
that Cisco?s real strength doesn?t lie in the
technology. ?They charge premium prices
for yesterday?s products, but they can do
that because they?ve got the best support in
the industry,? says Skriver. Yet it?s precisely
that high-end technology that?s so critical for
enabling operators to stay competitive in
delivering next-generation service.
Translation? The high-end router market is
wide open.

THE PLAYERS

Loads of players with the cutting edge
technology are certainly willing to fill in that
gap. The best way to start sifting through
them is by looking at their clustering ability.
With clustering, chassis are grouped together
to form a single router. Route decisions are
made once for the entire cluster so that
carriers can reach terabits of throughput
without incurring additional router hops.
(There's more to carrier-class routing than
just terabits of performance, however ... see
Figure. )

Using that criteria, seven vendors stand out
(see Table 1 ). Three are terabit start-ups:
Avici Systems, the first vendor out with a
terabit router; Pluris; and Charlotte?s Web
Networks ( www.cwnt.com ). A fourth
vendor, Ironbridge Networks, is developing
a terabit router that?s expected to ship in the
fourth quarter of 2000.

However, the terabit turf battles aren?t only
going to be fought among newbie router
providers. Cisco shipped a terabit router, the
12016, in January 2000. Lucent
Technologies (formerly Nexabit) delivered
its router in 1999, while Nortel Networks
will ship the Versalar Switch Router this
month.

The only exception is Everest from Tellab?s
Internetworking Systems Division (formerly
Netcore). The box may not reach terabit
speeds, but it is orders of magnitude faster
than existing gigabit core routers and has the
reliability features network architects want.

Conservative buyers might be tempted to
dismiss the competition for the market?s
high-end as just hype. After all, Cisco holds
over 80 percent of the router market. Only a
vendor with tremendous muscle would be
able to encroach on that terrain, or so the
argument goes. ?There?s Cisco and
ourselves,? says Mukesh Chatter, vice
president and general manager of IP
products at Lucent. ?The rest are just a
sideshow.?

However, those quick assessments may miss
the mark. ?We?ve got the gear, and I can
assure you Avici is no sideshow,? says John
Griebling, vice president of network
engineering and operations at Enron
Communications, a provider of IP-based
services and a current Cisco user.

THE HOLDUP

Having the gear in hand is still pretty
unusual. While router vendors have long
talked about their new high-end routers, the
reality is that product only recently started
shipping. The problem is the silicon.
Stabilizing the high-speed ASICs has proven
a challenge for the industry, particularly as
the standards being cast continue to change,
forcing silicon revisions.

Solving those challenges means some fancy
footwork, which is why, until recently, so
few vendors have shipped terabit routers.
Tellabs became the first vendor to deliver a
multichassis router in 1999 by using Field
Programmable Gate Arrays (FPGAs) and
off-the-shelf silicon. The problem? The
throughput of its Everest product is very
limited in comparison to other multichassis
gear. The problem is the FPGAs: they aren?t
as scalable as ASICs, says Charlie Jenkins,
vice president of sales and marketing at
Solidum ( www.solidum.com ), a
manufacturer of high-speed classification
engines.

This is why Avici took a different approach.
While many companies farm out the back
end of Real Time Logic (RTL) code
development for their ASICs, Avici says it
has kept development in-house, enabling the
company to make modifications later in the
ASIC development cycle. ?I won?t pretend
that we didn?t have bugs in our ASIC,? says
Peter Chadwick, vice president of product
management at Avici, ?but with RTL
development in-house, we could find them
quickly.?

Even when vendors do have products
shipping, key options may not be available,
which makes it difficult to get an accurate
picture of what?s actually deliverable. For
example, the external switching gear that
enables Cisco and Tellabs to cluster their
chassis isn?t yet available, despite the router
shipments.

Interfaces are a whole other matter. While
vendors might talk about an OC-192
(9.952Gbit/sec) interface, just try ordering
one. ?Lucent has an OC-192 card, and it
works. That?s unusual,? says Scott Beudoin,
senior technologist in data services at
Williams Communications. ?Most vendors
say they have OC-192, but they don?t.?

HARDWARE, HARD FACTS

So just what level of redundancy and
reliability do these products offer network
architects? To hear the rhetoric, these
routers sound like they?re ready to deliver
nonstop IP services today.

?You can pull any board out and the
machine [the 64000] will continue to
operate without interruption,? says Lucent?s
Chatter. Meanwhile, Cisco claims the 12016
offers ?carrier-class reliability? and provides
rapid and complete recovery from
switch-fabric, line card, fabric, and power
supply failures.

However, the devil?s in the details, and
here?s where network architects need to
examine the hardware and software fault
tolerance (see Table 2 ). On one hand, the
terabit router greatly improves hardware
reliability. Concerning the basics, all terabit
routers ship with redundant blowers and
power supplies and are Network Equipment
Building Systems (NEBS)-compliant. NEBS
is a Bellcore (now Telecordia) specification
that?s become the de facto standard for
ensuring that carrier equipment meets
safety, functionality, and interoperability
levels. This covers things like earthquake
and office vibration resistance.

Now move up to the actual routing
components. While gigabit routers like
Juniper?s M40 and Cisco?s 7200 offer no
redundancy in the routing engine subsystem,
that?s not the case with terabit routers.
Lucent?s 64000 and Cisco?s 12016, for
example, can be configured with redundant
I/O modules, switch-fabric boards, and route
control processors. Avici says switching is
distributed across I/O modules. Lose a
module, and traffic is switched by the other
modules. When Pluris ships its 2000, each of
the product?s I/O modules will be wired to
two switch modules. With a total of 16
switch modules, the 2000 can lose half of its
switching fabric before a failure will take out
a link.

What?s more, since terabit routers separate
route calculation and I/O processing on
different modules, a failure to one won?t
necessarily affect the other. For example,
pull the route control processor, and the
64000 will continue to forward packets,
though it may not be able to implement
routing updates. That?s certainly not the
case with the 7000, as Skriver can attest.

And there?s the rub. Except for Lucent,
none of the vendors shipping gear claim to
be able to continue adding or changing
routes when a route processing engine is
pulled. As for Lucent?s claim, Beudoin
doesn?t buy it. ?No one I?ve run across can
claim to have redundant hot concurrent
parallel processing router engines,? he says.

What they can provide is automatic
switchover to a backup route processing
engine. This requires either a reboot of the
entire system (as with the Everest) or, in the
best of cases, network architects can restart
the processor. Either way, expect up to a
minute of downtime?and that?s just not fast
enough. ?Providers want 45 milliseconds?
switchover,? says Chadwick. ?When there
are 100 OC-192s coming through a
box?that?s a lot of data to turn off.?

The challenge has to do with the Border
Gateway Protocol (BGP), the Internet
protocol used for communicating route
changes. BGP sessions run over TCP and as
such have a lot of ?state? associated with
them, says Chadwick. Knowing the exact
state wherein a second processor can take
over hasn?t been accomplished by any of the
vendors, says Beudoin.

The vendors are certainly working on it,
though. Tellabs says it will offer automatic
switchover to a backup route processor in
the 1.3 code release, due in June 2000, or
the 1.4 code release (the Everest currently
runs 1.2). The vendor will have two
management cards running in parallel,
where the primary card is mirrored onto the
secondary cards. The interfaces appear
logically as one, so they?re both able to
maintain an accurate state of the BGP
session. Should the primary card fail, the
second will take over. Pluris expects to offer
the same feature in its 20000 later in 2000.

Finally, there?s cluster expansion. Reaching
more ports is one thing, but doing that
without affecting the operation of the
existing router is something else. Neither
Lucent nor Cisco can grow the routing
cluster without affecting the operation of the
installed router. Other vendors claim to offer
chassis insertion. For example, Nortel says
that growing the 25000 is a matter of
hot-inserting interfaces into each router and
then tying them together through the Optera
Packet Core product. The routers
automatically identify that they?ve become
part of a cluster and adjust accordingly.

SOFTWARE RELIABILITY

Router software availability is yet another
issue. Even Tellabs and Pluris won?t be able
to recover from a software bug. The
problem is simple: Because both processors
are identical?that is, running identical code
using identical data?a bug that crashes one
processor will also crash the other. In its
defense, Pluris argues that those
expectations might be unreasonably high;
fault-tolerant systems have never been able
to crack this problem, so there?s no reason
to expect any more from router vendors.

Then there?s the issue of in-service software
upgrades. Getting to zero downtime means
being able to upgrade routing code without
affecting router operation. With in-service
upgrades, new versions of the routing code
are brought online without dropping packets.

Today, the Juniper M40 alone among the
gigabit routers has that ability. This is
because the M40 runs on top of Unix, which
enables new features to be added as the
machine is running. According to John
Stewart, network engineer at Juniper, the
M40 permits in-service upgrades of some
drivers, the SNMP functionality, and the
routing protocols.

Among the terabit pack, Lucent, Charlotte?s
Web, and Nortel Networks come closest to
this functionality. Lucent claims it can
handle limited in-service upgrades by letting
users add a feature, like a new protocol,
without taking down the router. The vendor
claims that it can also upgrade BGP and
enhance SNMP functionality without
affecting router operation.

Protected mode memory is another matter
altogether. With the M40 running on Unix,
Juniper can isolate processes from each
other. This way, a corrupted memory
pointer, for example, won?t crash the entire
set of code running on that processor.

Cisco is said to be adding a similar feature to
its IOS Extended Network Architecture
(ENA). Lucent claims to use protected
mode memory. Pluris expects to add
protected mode memory shortly but
contends that, due to its fault-tolerant
architecture, the move isn?t critical, as it can
reboot the processor without dropping
packets.

FUNNY NUMBERS

The scalability story is only slightly better
than the reliability pitch. While these
vendors claim performance that is orders of
magnitude better than existing enterprise
devices, getting a fix on exact port counts is
another matter. Vendors play three different
types of shell games to boost their
performance claims:

Game 1: Measuring box performance by the
speed of the internal architecture. Lucent
claims to have 6.4Tbits/sec of throughput,
but that?s the internal speed of the box.
Lucent?s Chatter argues that looking at the
internal capacity of the box is key to getting
a sense of its scalability. Skeptics may have
another read on the matter: Inflating
performance numbers is relatively easy to
do. ?I think this is a meaningless number,?
says Avici?s Chadwick. Sore words from a
competitor? Perhaps. Then again, Avici has
nothing to lose by going with internal
numbers. Chadwick says Avici?s bus
architecture runs around 32Tbits/sec.

Game 2: Tracking the packets. Counting
packets can be a valuable measure, giving an
indication of the actual forwarding capacity
of the box. Of course, this assumes vendors
count packets the same way, which is hardly
ever the case.

Critics claim Cisco double-counts the
packets: once as they enter the chassis and
once as they leave. Other vendors only
count inbound packets. Then there?s the
issue of packet sizes. Vendors can count
with minimum-size packets. Others use
longer packet lengths, placing less of a strain
on the router.

Game 3: Counting performance by cluster
size. Unlike enterprise boxes, all of the
carrier-class routers can be grouped together
to form a single, logical box. Some vendors,
like Pluris, cite the capacity for the entire
cluster as the capacity for the system, which
in the case of the 20000 series would run
184Tbits/sec.

The best approach? Measure routers by the
aggregate I/O capacity of a single chassis. At
the end of the day, I/O is the only thing that
can be bought, not system performance. The
math here is easy: just multiply the
interfaces by their line rate. Using this
approach, it?s clear that none of the routers
can actually handle terabits of data. Lucent,
for example, tops out at 159Gbits/sec, Pluris
and Cisco at 149Gbits/sec.

Think of it this way: Except for Tellabs, the
vendors more than quintuple the eight
OC-48 ports supported on a Juniper M40.
Avici actually claims to reach 40 OC-48s in
a single box, or nearly 560 OC-48s per
cluster. Tellabs? use of FPGAs limits the box
to just four OC-48s per chassis, forcing it to
rely on clustering to reach up to 256
OC-48s.

CLUSTERING MUSCLE

Out in the real world, port counts don?t
scale as neatly as they do when simply
adding the maximum number of ports per
chassis together. Reaching the maximum
cluster size or level of reliability typically
means cutting into port counts.

Start with clustering. Charlotte?s Web
Networks, Lucent, and Nortel use interface
ports to cluster their routers, reducing the
port density on the box. Charlotte?s Web
Networks? Aranea, for example, clusters 32
chassis using special modules that consume
up to 25 percent of the box?s interfaces. The
other vendors claim to scale using their
switching fabric rather than interfaces.

Port counts also need to be looked at in the
context of the space in the Point of Presence
(POP). With space at a premium, operators
aren?t concerned only with getting high port
densities per chassis; they also want high
port densities for the seven-foot racks that
house the gear. Some of that is obvious.
With 159Gbits/sec of throughput, one 64000
will match over 15 Everest boxes.

However, there are some less obvious
factors, like chassis width. Here again the
FPGA design hurts the Everest. The
Everest?s chassis is 23-inches wide instead
of the usual 19 inches, says Joe Durkin,
senior product manager at Tellabs. The
problem? Some installations only have racks
19 inches wide.

Then consider the impact resilience will
have on the port counts. Cisco?s 12016, for
example, can be equipped with redundant
route processor cards, but that means
burning a slot for I/O ports. Tellabs has a
similar problem. The Everest sports four
interface cards for I/O processing. Each
interface card handles traffic coming in from
four line cards. To gain redundancy in the
I/O processing modules, network architects
can designate an interface card as a backup,
but in doing so they can?t utilize the four I/O
processing modules. ?In practice, very few
customers take advantage of the redundancy
for that very reason,? says Durkin.

Finally, check out the distances between
nodes in a cluster. Some vendors rely on
Synchronous Digital Hierarchy (SDH) to
extend the distance between clustered
nodes. Tellabs, for example, can hit 26
kilometers between nodes. This enables
operators to ensure greater resilience by
locating nodes on different floors or in
different buildings. Other vendors operate
under much tighter distance constraints.
Avici requires nodes to be connected
directly together.

FUTURES

Looking forward, terabit router vendors are
aiming to integrate their devices with other
network elements. At the bottom layer, that
means hooking into optical devices. The idea
is that the terabit router and the optical
switch will communicate using MPLS,
supported by all the routers in the review.
The synthesis will let operators do things like
automatically provision the network based
on layer-3 intelligence.

At the high end, it means giving network
control to software; end-user applications
will be enabled to request required network