SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Frank Coluccio Technology Forum - ASAP

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: Frank A. Coluccio who wrote (43)10/10/1999 9:05:00 PM
From: ftth   of 1782
 
Distributing the load: distributed load balancing in server farms

Network World, August 6, 1999 pNA

Author
Gibbs, Dwight


Since I wrote about local load balancing last week, it seems only natural that this week's column takes the
next step and focuses on distributed load balancing.

First, let's define the term. Distributed load balancing involves using two or more geographically
dispersed server farms to serve content. The idea is to have some or all of the same content at each
farm. Why would anyone do this? There are three main reasons:

Customer Experience: Ideally this will place the content closer to the customer. The goal is to route
customer requests to the server farm nearest to the customer. This will, in theory, provide quicker access
and, therefore, a better customer experience. Sounds great, no?

Redundancy: Another reason for doing the distributed load-balancing gig is redundancy. In the event of a
catastrophic failure at one server farm, all traffic would be rerouted to another farm.

Scalability: Using distributed load balancing, you can bring servers at different locations online to handle
increased loads.

In the old days (18 to 24 months ago), the tools to facilitate distributed load balancing were, quite
frankly, pretty bad. Cisco had their Distributed Director. It worked. The problem was that it relied solely
on Border Gateway Protocol (BGP). The result is that it would send customers to the server farm that was
the fewest hops away from the customer's ISP. It did not factor congestion into the equation. The result is
that customers could be sent over router hops that were totally overloaded when they should have been
sent over more, less-taxed hops.

Another offering from the dark ages was Global Center's ExpressLane product. It was a good idea,
incorporating trace routes to measure congestion. The Motley Fool used it. When it worked, it worked
pretty well. When it did not work, our site was completely down. It was a good idea but was eventually
killed as Global Center is an ISP not a software shop.

In the past year, several companies have made great strides in the distributed load balancing market.
RadWare has its WSD-DS product (http://www.radware.com/www/wsd_ds.htm). F5 has a 3DNS product
(http://www.f5.com/3dns/index.html). Cisco still has Distributed Director
(http://www.cisco.com/warp/public/cc/cisco/mkt/scale/distr/index.shtml). GTE/BBN acquired Genuity
and the Hopscotch product (http://www.bbn.com/groups/ea/performance/traffic_dist.htm). Do these
products work? Probably. I have not used any of them. In fact, I think they are quickly becoming
completely irrelevant. Now before you tell me to put the crack pipe down, hear me out.

As I see it, there are two types of Web pages: static and customized. Static pages do not change after they
are published to a site, thus the name. The same page goes to every customer who requests it. As the
name suggests, customized pages can change for every single customer. A CGI or ASP script may be used
to grab information from a database and insert it into a page before sending it to a customer. What does
this have to do with anything?

If you host mostly static content, it does not make sense to use distributed server farms. I think it makes
much more sense to maintain a single server farm and use a caching service such as Akamai
(http://www.akamai.com/home.shtml) or Sandpiper (http://www.sandpiper.com/). These services have
LOTS of servers around the Internet. Their customers are essentially relying on their distributed load
balancing to achieve better performance, redundancy and scalability. This becomes even more attractive
when you consider that your hardware needs will be much lower than if you served every single page
yourself. Less hardware means fewer headaches. I don't know about you, but I could certainly do with
fewer hassles. It sounds good in theory. Does it work in practice? I think so.

We use Akamai to serve static content on the Fool site. The only completely static files we have are our
graphics. They are the same for every single customer. While we have seen some glitches with the Akamai
system, overall I have been pretty pleased. The load on our servers is reduced. Our bandwidth usage is
also reduced (graphics are the bulk of data transferred). And the site feels faster to our customers. The
cost savings from the decrease in bandwidth and server resources do not completely offset the cost of the
Akamai service. However, when I factor in the better customer experience and fewer technical headaches,
tough to quantify though they are, I think Akamai more than pays for itself. While I have not used
Sandpiper, I have talked to their CTO and several of their techs. It sounds pretty interesting. All that said,
the use of these caching services is not without problems.

The main problem with caching has been usage tracking. To get around that, you can put a tiny graphic
on the footer of every page that you will not cache. This graphic will be called for every page served. Since
your Web servers will cache the graphic, the load on the boxes should not be too great. Since the graphic
is small, the bandwidth requirements should be minimal. Ad serving should not be a problem, as the
graphic file download just described will tell you how many page views you received. Click-throughs on
the ad banners will still hit your ad server. There are other issues: expiration and honoring of TTLs and
maintaining control among them. However, I think the benefits far outweigh the costs.

In a nutshell, if you are serving static content, I think it makes much more sense to forget about doing the
distributed load balancing yourself and let someone else worry about the distributed sites. Akamai and
Sandpiper have better distribution than you will be able to achieve anyway. By working with a service such
as these, you can achieve redundancy, scalability, and a better customer experience with minimal pain,
anguish and gnashing of teeth. The cost of this kind of caching is also significantly less than the cost of
maintaining your own servers in numerous networks. What about dynamic content?

Does it make sense to use distributed sites if you serve dynamic content? The answer is, "Maybe." If you
don't make extensive use of databases, distributed sites may make sense. If you can handle the site
management and costs, and if speed & reliability are crucial to your business, using distributed sites
makes sense. However, if you make extensive use of databases, particularly inserts and updates, you
probably do not want to use distributed sites. Why is this?

In one sentence: Two-way, near real-time database replication over the Internet is a pain in the butt, if
not impossible. Database replication can be a PITA as it is. Place one database cluster in San Francisco
and another in New York and replication REALLY gets painful.

We actually tried to make this work for the Fool. We had distributed sites and hoped to use a database
server at each site. After getting into the replication project, we decided it was not worth the effort. There
was one huge problem we could not get around. Suppose a customer changes information in a database
on a server in New York. Something happens and that customer is bumped to the servers in San
Francisco. She changes information there before any replication can happen. What do you do? You have
the same record in the same table on two different servers with different values in the fields. How do you
reconcile that? We could never come up with a satisfactory way to handle this. We came up with some
kludges, but nothing worth acting on. So we consolidated all database activity at one server farm. As our
site became more and more dynamic, the traffic to the nondatabase farm dropped off to nothing. Finally
it did not make good financial sense to maintain both sites. So we closed the nondatabase farm.

If you use a database for your site and it is primarily read only, then it is much easier to do distributed
load balancing. In this model, you have one publisher and several subscribers. You can simply publish
the database every X hours to make this work. If speed and reliability are crucial, distributed load
balancing in this scenario may make sense.

I think that if you are serving static content, it makes much more sense to use a caching network like
Akamai or Sandpiper than to do distributed load balancing yourself.
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext