Distributing the load: distributed load balancing in server farms 
                   Network World, August 6, 1999 pNA
                   Author                  Gibbs, Dwight
                                     Since I wrote about local load balancing last week, it seems only natural that this week's column takes the                  next step and focuses on distributed load balancing. 
                   First, let's define the term. Distributed load balancing involves using two or more geographically                  dispersed server farms to serve content. The idea is to have some or all of the same content at each                  farm. Why would anyone do this? There are three main reasons: 
                   Customer Experience: Ideally this will place the content closer to the customer. The goal is to route                  customer requests to the server farm nearest to the customer. This will, in theory, provide quicker access                  and, therefore, a better customer experience. Sounds great, no? 
                   Redundancy: Another reason for doing the distributed load-balancing gig is redundancy. In the event of a                  catastrophic failure at one server farm, all traffic would be rerouted to another farm. 
                   Scalability: Using distributed load balancing, you can bring servers at different locations online to handle                  increased loads. 
                   In the old days (18 to 24 months ago), the tools to facilitate distributed load balancing were, quite                  frankly, pretty bad. Cisco had their Distributed Director. It worked. The problem was that it relied solely                  on Border Gateway Protocol (BGP). The result is that it would send customers to the server farm that was                  the fewest hops away from the customer's ISP. It did not factor congestion into the equation. The result is                  that customers could be sent over router hops that were totally overloaded when they should have been                  sent over more, less-taxed hops. 
                   Another offering from the dark ages was Global Center's ExpressLane product. It was a good idea,                  incorporating trace routes to measure congestion. The Motley Fool used it. When it worked, it worked                  pretty well. When it did not work, our site was completely down. It was a good idea but was eventually                  killed as Global Center is an ISP not a software shop. 
                   In the past year, several companies have made great strides in the distributed load balancing market.                  RadWare has its WSD-DS product (http://www.radware.com/www/wsd_ds.htm). F5 has a 3DNS product                  (http://www.f5.com/3dns/index.html). Cisco still has Distributed Director                  (http://www.cisco.com/warp/public/cc/cisco/mkt/scale/distr/index.shtml). GTE/BBN acquired Genuity                  and the Hopscotch product (http://www.bbn.com/groups/ea/performance/traffic_dist.htm). Do these                  products work? Probably. I have not used any of them. In fact, I think they are quickly becoming                  completely irrelevant. Now before you tell me to put the crack pipe down, hear me out. 
                   As I see it, there are two types of Web pages: static and customized. Static pages do not change after they                  are published to a site, thus the name. The same page goes to every customer who requests it. As the                  name suggests, customized pages can change for every single customer. A CGI or ASP script may be used                  to grab information from a database and insert it into a page before sending it to a customer. What does                  this have to do with anything? 
                   If you host mostly static content, it does not make sense to use distributed server farms. I think it makes                  much more sense to maintain a single server farm and use a caching service such as Akamai                  (http://www.akamai.com/home.shtml) or Sandpiper (http://www.sandpiper.com/). These services have                  LOTS of servers around the Internet. Their customers are essentially relying on their distributed load                  balancing to achieve better performance, redundancy and scalability. This becomes even more attractive                  when you consider that your hardware needs will be much lower than if you served every single page                  yourself. Less hardware means fewer headaches. I don't know about you, but I could certainly do with                  fewer hassles. It sounds good in theory. Does it work in practice? I think so. 
                   We use Akamai to serve static content on the Fool site. The only completely static files we have are our                  graphics. They are the same for every single customer. While we have seen some glitches with the Akamai                  system, overall I have been pretty pleased. The load on our servers is reduced. Our bandwidth usage is                  also reduced (graphics are the bulk of data transferred). And the site feels faster to our customers. The                  cost savings from the decrease in bandwidth and server resources do not completely offset the cost of the                  Akamai service. However, when I factor in the better customer experience and fewer technical headaches,                  tough to quantify though they are, I think Akamai more than pays for itself. While I have not used                  Sandpiper, I have talked to their CTO and several of their techs. It sounds pretty interesting. All that said,                  the use of these caching services is not without problems. 
                   The main problem with caching has been usage tracking. To get around that, you can put a tiny graphic                  on the footer of every page that you will not cache. This graphic will be called for every page served. Since                  your Web servers will cache the graphic, the load on the boxes should not be too great. Since the graphic                  is small, the bandwidth requirements should be minimal. Ad serving should not be a problem, as the                  graphic file download just described will tell you how many page views you received. Click-throughs on                  the ad banners will still hit your ad server. There are other issues: expiration and honoring of TTLs and                  maintaining control among them. However, I think the benefits far outweigh the costs. 
                   In a nutshell, if you are serving static content, I think it makes much more sense to forget about doing the                  distributed load balancing yourself and let someone else worry about the distributed sites. Akamai and                  Sandpiper have better distribution than you will be able to achieve anyway. By working with a service such                  as these, you can achieve redundancy, scalability, and a better customer experience with minimal pain,                  anguish and gnashing of teeth. The cost of this kind of caching is also significantly less than the cost of                  maintaining your own servers in numerous networks. What about dynamic content? 
                   Does it make sense to use distributed sites if you serve dynamic content? The answer is, "Maybe." If you                  don't make extensive use of databases, distributed sites may make sense. If you can handle the site                  management and costs, and if speed & reliability are crucial to your business, using distributed sites                  makes sense. However, if you make extensive use of databases, particularly inserts and updates, you                  probably do not want to use distributed sites. Why is this? 
                   In one sentence: Two-way, near real-time database replication over the Internet is a pain in the butt, if                  not impossible. Database replication can be a PITA as it is. Place one database cluster in San Francisco                  and another in New York and replication REALLY gets painful. 
                   We actually tried to make this work for the Fool. We had distributed sites and hoped to use a database                  server at each site. After getting into the replication project, we decided it was not worth the effort. There                  was one huge problem we could not get around. Suppose a customer changes information in a database                  on a server in New York. Something happens and that customer is bumped to the servers in San                  Francisco. She changes information there before any replication can happen. What do you do? You have                  the same record in the same table on two different servers with different values in the fields. How do you                  reconcile that? We could never come up with a satisfactory way to handle this. We came up with some                  kludges, but nothing worth acting on. So we consolidated all database activity at one server farm. As our                  site became more and more dynamic, the traffic to the nondatabase farm dropped off to nothing. Finally                  it did not make good financial sense to maintain both sites. So we closed the nondatabase farm. 
                   If you use a database for your site and it is primarily read only, then it is much easier to do distributed                  load balancing. In this model, you have one publisher and several subscribers. You can simply publish                  the database every X hours to make this work. If speed and reliability are crucial, distributed load                  balancing in this scenario may make sense. 
                   I think that if you are serving static content, it makes much more sense to use a caching network like                  Akamai or Sandpiper than to do distributed load balancing yourself. |