Distributing the load: distributed load balancing in server farms
Network World, August 6, 1999 pNA
Author Gibbs, Dwight
Since I wrote about local load balancing last week, it seems only natural that this week's column takes the next step and focuses on distributed load balancing.
First, let's define the term. Distributed load balancing involves using two or more geographically dispersed server farms to serve content. The idea is to have some or all of the same content at each farm. Why would anyone do this? There are three main reasons:
Customer Experience: Ideally this will place the content closer to the customer. The goal is to route customer requests to the server farm nearest to the customer. This will, in theory, provide quicker access and, therefore, a better customer experience. Sounds great, no?
Redundancy: Another reason for doing the distributed load-balancing gig is redundancy. In the event of a catastrophic failure at one server farm, all traffic would be rerouted to another farm.
Scalability: Using distributed load balancing, you can bring servers at different locations online to handle increased loads.
In the old days (18 to 24 months ago), the tools to facilitate distributed load balancing were, quite frankly, pretty bad. Cisco had their Distributed Director. It worked. The problem was that it relied solely on Border Gateway Protocol (BGP). The result is that it would send customers to the server farm that was the fewest hops away from the customer's ISP. It did not factor congestion into the equation. The result is that customers could be sent over router hops that were totally overloaded when they should have been sent over more, less-taxed hops.
Another offering from the dark ages was Global Center's ExpressLane product. It was a good idea, incorporating trace routes to measure congestion. The Motley Fool used it. When it worked, it worked pretty well. When it did not work, our site was completely down. It was a good idea but was eventually killed as Global Center is an ISP not a software shop.
In the past year, several companies have made great strides in the distributed load balancing market. RadWare has its WSD-DS product (http://www.radware.com/www/wsd_ds.htm). F5 has a 3DNS product (http://www.f5.com/3dns/index.html). Cisco still has Distributed Director (http://www.cisco.com/warp/public/cc/cisco/mkt/scale/distr/index.shtml). GTE/BBN acquired Genuity and the Hopscotch product (http://www.bbn.com/groups/ea/performance/traffic_dist.htm). Do these products work? Probably. I have not used any of them. In fact, I think they are quickly becoming completely irrelevant. Now before you tell me to put the crack pipe down, hear me out.
As I see it, there are two types of Web pages: static and customized. Static pages do not change after they are published to a site, thus the name. The same page goes to every customer who requests it. As the name suggests, customized pages can change for every single customer. A CGI or ASP script may be used to grab information from a database and insert it into a page before sending it to a customer. What does this have to do with anything?
If you host mostly static content, it does not make sense to use distributed server farms. I think it makes much more sense to maintain a single server farm and use a caching service such as Akamai (http://www.akamai.com/home.shtml) or Sandpiper (http://www.sandpiper.com/). These services have LOTS of servers around the Internet. Their customers are essentially relying on their distributed load balancing to achieve better performance, redundancy and scalability. This becomes even more attractive when you consider that your hardware needs will be much lower than if you served every single page yourself. Less hardware means fewer headaches. I don't know about you, but I could certainly do with fewer hassles. It sounds good in theory. Does it work in practice? I think so.
We use Akamai to serve static content on the Fool site. The only completely static files we have are our graphics. They are the same for every single customer. While we have seen some glitches with the Akamai system, overall I have been pretty pleased. The load on our servers is reduced. Our bandwidth usage is also reduced (graphics are the bulk of data transferred). And the site feels faster to our customers. The cost savings from the decrease in bandwidth and server resources do not completely offset the cost of the Akamai service. However, when I factor in the better customer experience and fewer technical headaches, tough to quantify though they are, I think Akamai more than pays for itself. While I have not used Sandpiper, I have talked to their CTO and several of their techs. It sounds pretty interesting. All that said, the use of these caching services is not without problems.
The main problem with caching has been usage tracking. To get around that, you can put a tiny graphic on the footer of every page that you will not cache. This graphic will be called for every page served. Since your Web servers will cache the graphic, the load on the boxes should not be too great. Since the graphic is small, the bandwidth requirements should be minimal. Ad serving should not be a problem, as the graphic file download just described will tell you how many page views you received. Click-throughs on the ad banners will still hit your ad server. There are other issues: expiration and honoring of TTLs and maintaining control among them. However, I think the benefits far outweigh the costs.
In a nutshell, if you are serving static content, I think it makes much more sense to forget about doing the distributed load balancing yourself and let someone else worry about the distributed sites. Akamai and Sandpiper have better distribution than you will be able to achieve anyway. By working with a service such as these, you can achieve redundancy, scalability, and a better customer experience with minimal pain, anguish and gnashing of teeth. The cost of this kind of caching is also significantly less than the cost of maintaining your own servers in numerous networks. What about dynamic content?
Does it make sense to use distributed sites if you serve dynamic content? The answer is, "Maybe." If you don't make extensive use of databases, distributed sites may make sense. If you can handle the site management and costs, and if speed & reliability are crucial to your business, using distributed sites makes sense. However, if you make extensive use of databases, particularly inserts and updates, you probably do not want to use distributed sites. Why is this?
In one sentence: Two-way, near real-time database replication over the Internet is a pain in the butt, if not impossible. Database replication can be a PITA as it is. Place one database cluster in San Francisco and another in New York and replication REALLY gets painful.
We actually tried to make this work for the Fool. We had distributed sites and hoped to use a database server at each site. After getting into the replication project, we decided it was not worth the effort. There was one huge problem we could not get around. Suppose a customer changes information in a database on a server in New York. Something happens and that customer is bumped to the servers in San Francisco. She changes information there before any replication can happen. What do you do? You have the same record in the same table on two different servers with different values in the fields. How do you reconcile that? We could never come up with a satisfactory way to handle this. We came up with some kludges, but nothing worth acting on. So we consolidated all database activity at one server farm. As our site became more and more dynamic, the traffic to the nondatabase farm dropped off to nothing. Finally it did not make good financial sense to maintain both sites. So we closed the nondatabase farm.
If you use a database for your site and it is primarily read only, then it is much easier to do distributed load balancing. In this model, you have one publisher and several subscribers. You can simply publish the database every X hours to make this work. If speed and reliability are crucial, distributed load balancing in this scenario may make sense.
I think that if you are serving static content, it makes much more sense to use a caching network like Akamai or Sandpiper than to do distributed load balancing yourself. |