SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : LSI Corporation -- Ignore unavailable to you. Want to Upgrade?


To: nord who wrote (17056)2/16/1999 3:48:00 AM
From: nord  Read Replies (1) | Respond to of 25814
 
dmreview.com

Scalable Systems Architecture: Hypergrowth and the Web

by Ken Rudin

The Web has rapidly become an effective medium for people and
businesses to communicate and collaborate. Nearly every area of
information technology has been affected by the pervasiveness of the
Web. Data warehouses are no different. They too have been affected, in
that a large (and rapidly growing) percentage of data warehouses are
being connected to the Web. By increasing access to your warehouse, you can increase the average knowledge level of your organization.

Of course, even before the Web, it has always been true that you could
increase the leverage of your warehouse by giving more people access to it, but it was never very easy to do so. You always had to struggle with proprietary networks, proprietary client/server protocols and special
client-side applications. But, the Web (and the Web-based technologies) make it far easier than before. You don't have to worry about installing additional client software (everyone just uses a Web browser) or distributing application updates to all users (the application logic is stored centrally on the server, not the browser). And, since the Web is ubiquitous, you don't have to worry about connectivity issues. By leveraging the Web, the infrastructure is already in place to enable you to access your warehouse from any place in the world.

But, universal access via the Web creates a whole set of issues that
must be handled. Distilled to its essence, Web access means your
warehouse will be exposed to more access by more users to more data.
These increases will, in turn, put more strain on your warehouse.
Whatever level of scalability requirements your warehouse had will
become magnified once you connect your warehouse to the Web.

Web Access Means More Users

Let's look at the "more users" issue. One of the most powerful reasons
for connecting your warehouse to the Web is to make it more accessible. Logically, then, it follows that if you increase accessibility, you will have more users than you would otherwise have. And, not only will you have more users, but each user will typically access the warehouse more frequently. Because Web browsers are ubiquitous, and because most people keep them running on their computers all the time, it becomes easier for users to use the warehouse more frequently. That is, there is an increased inclination to access the warehouse, simply because it is so easy to do so.

But, it's not just the initial increased size of your user population
that stresses the system. It's also the rate at which your warehouse
user population grows to this increased size. This problem is most
noticeable if you have a portion of your warehouse that is "public" ­
for example, it's available to all employees via a corporate intranet or it's available over the Internet to your suppliers or customers. Once you decide to make the data "public," you can no longer easily control the ramp-up in the number of users. This can very quickly lead to rapid growth in the number of warehouse users. At first, a few users in the population will experiment with the warehouse, and then others will see the benefit and want to use it as well, and so on. Very quickly, you will find yourself with a large user population that is growing exponentially.

Hypergrowth and the Web

I call this phenomenon "hypergrowth." It refers to the fact that your
user base will grow faster than your ability to scale up your
warehouse's resources to meet these growing requirements. You won't be
able to scale up your environment fast enough to keep up with this
hypergrowth. You simply can't add new CPUs, disk drives and memory (and test them all to make sure you have no bottlenecks) fast enough to keep up with demand.

There are two approaches to dealing with hypergrowth. The first approach is to just avoid the problem in the first place. Most warehouse developers become over-zealous about making the warehouse instantly available to the "public." But, in most cases, I would suggest proceeding with caution. If you're not sure of the usage patterns, I would recommend against making it public initially. Instead, use the traditional approach of rolling your warehouse out to a few users, then a few more, etc. This can be done by password protecting the access, and only giving the password to select groups. (Of course, people can share the password with people outside their group, but you can monitor usage to see if unintended users are accessing your warehouse.)

However, sometimes there is a valid reason to make your warehouse
accessible to your entire organization. So, since we aren't going to
avoid hypergrowth, we need a different approach that allows us to handle it. The key to handling hypergrowth is to note that it only occurs during the initial stages of your warehouse's life cycle. The trick is to build the initial iteration of your warehouse so that it has enough resources to handle where you will be when the hypergrowth subsides. To do this, you have to determine where you think that point will be. This is accomplished by looking at the growth plan for the warehouse. How many users do you expect, what workload will they be generating and in what time frame?

After looking at the growth plans for your warehouse (or at least making your best guesses), you can define a graph that looks something like Figure 1. According to this graph, we can see that we expect the
hypergrowth phase to level off sometime in May, supporting roughly 350
users. So, even though we plan to start with only about 50 users, since we expect to grow extremely rapidly to 350 users, we build our first iteration to support 350 users rather than 50.

Web Access Means More Data

Next, let's look at why Web-enabled data warehouses imply more data. The answer is intuitive: the graphical nature of the Web makes it natural and simple to make requests for multimedia data. For example, an insurance company may choose to not only store the traditional numeric and text data about car accident insurance claims, but might also store a digitized photograph of the car itself. End users could use the numeric and text data types to perform their analytical processing, and then perform drill downs on specific data items to get not only the traditional data on a particular record, but also the related image as well. With the potential need for storing large numbers of images, this means that the trend to larger and more rapidly growing data warehouseswill only accelerate. In addition, not only will your warehouse be responsible for storing more data, but the requests for multimedia data also require much more bandwidth than is required for traditional data types.

Ultimately, what does more access by more users to more data really
mean? It means that the requirement for a scalable warehouse environment is increased. Addressing these issues requires scalable design principles, such as those I've historically discussed in this column. Just remember that the requirement to use these scalable techniques will be even more critical if your warehouse is connected to the Web.

1 For simplicity, we will use the term "web" to refer to both the World Wide Web and to intranets that use Web-based technology.

------------------------------------------------------------------------
Ken Rudin is the CEO of Emergent Corporation, an independent consulting firm dedicated to helping businesses design and implement scalable IT solutions. He has published many articles on designing and implementing scalable solutions. He can be reached at akrudin@emergent.com.