To: Captain Jack who wrote (82985 ) 6/28/2000 9:59:00 AM From: rudedog Read Replies (3) | Respond to of 97611
Jack - hierarchical storage research advocates have been looking at this for years, but it takes someone with IBM's clout to make it possible. Here's how it works - in an overall computer architecture, stored data is accessed at very different frequencies, and the cost of storage can reflect that. Data which is accessed only a few times a year can go into a tape library. Data which is accessed maybe weekly can go to a central storage farm rather than being locally available. Frequently accessed common files can go on a local file server. Very frequently used disk store, like swap files, can go on local (in the box) disk. Files currently in use are "virtualized" and appear to the OS to be in RAM - the more frequently used "pages" are actually in RAM. All of this can be managed by a comprehensive HSM (Hierarchical storage manager) which moves data in background. What the IBM guys have done is take this one step further. Looking at things from the CPU's perspective, there are 3 key levels of speed access - and all of them are thousands of times faster than any external media, and maybe 50 times faster even than RAM access. They are the internal processor I&D cache (Instruction and Data) which are part of the actual instruction execution pipeline and run at "core" speed - i.e. at the processor clock rate - this is usually known as L1 (level 1) cache. The second level is a small cache which runs at either processor speed, or half processor speed, and attempts to keep the most likely instructions and data available to the L1 cache. Even though this L2 cache can run at clock speed, it is still slower than L1, because it is not in the direct execution pipeline, and can cause a "processor stall" where the ALU or other CPU components must wait for a transfer from L2 to I&D cache. Much of the work in superscalar design is to optimize the way that L1 and L2 work together to reduce processor stalls, and includes things like predictive prefetch, where both sides of a branch get loaded into L2 to improve performance whichever way the branch goes. Finally there is L3 cache - this is not typically implemented in desktop systems but is a feature of many big system designs, especially big switch fabric designs like the Alpha Wildfire. The L3 cache makes memory access "universal" - the time to access memory is about the same whether or not the memory is "local" or on the other side of a switch. L3 can also be used when the differences between RAM access and L2 access are very large. In today's fast processor designs, that difference can be 16 to 1 or more, and the L3 provides the appearance of memory that is only half or 1/4 the speed of L2, rather than 1/16 or less. What IBM has done is to introduce an additional layer of memory caching which is somewhere between the processor cache algorithms and HSM. The memory subsystem itself has the capability to use data compression to take less frequently used data in RAM and compress it, using techniques similar to those we see in "zip" files and MPEG compression, but optimized for in-memory use. The processor does not know that some memory is "fast" and other is "slow", any more than it knows that data is in L3 cache or in RAM. The memory subsystem takes less frequently used data and compresses it, and uncompresses it when it is accessed. The success of this technique is heavily dependent on how good the predictions about data access rates are, and also, of course, on the job stream. If the system is a big server which is heavily multitasked, and the pattern of data use is not very predictable, this technique could cost performance. But in a system where the data access tends to concentrate on "hot spots" and the absolute size of the available RAM memory is a constraint - such as an "in memory" database - the technique could add a lot of value.