Dear Mary:
I also maintained and upgraded warehousing systems (they have the same priority and performence boosting algorithms common in ERP). Milwaukee has the #1, #2, and #3 warehousing VARs (McHugh Freeman, HK systems, and Catalyst). They have to plan for priority and throughput. That means full table scans with complex queries. Still the amount of processing time needed versus disk read and write time is still quite low.
The fastest disk subsystems (as opposed to SSD) are in the thousand I/Os a second range. Even a 300 MHz Celeron can perform 100K instructions to one I/O to/from disk. The latency of disk is the problem, a comparison between tens of nanosecond DRAM access and milliseconds of disk access is no contest (5 orders of magnitude). The reason why memory (disk caches) is so prized and even, SSD, even at the same bandwidths (160-320 MB/sec), still is preferred is that access times are in microseconds (1 to 2 orders of magnitude), when the amount of memory is maxed.
Thus do not fall into the trap that high bandwidth makes up for extremely long latency. The real reason why TPCm scores go up with the larger boxes is memory size (the 8 ways have twice the memory of the 4 ways).
If you want a good overview on what TPC-C (TPCm) and TPC-D (DSS) do and need a little detail on what goes in to getting good performance look at: tpc.org . ERP uses fixed precompiled queries for maximum performance (the tuning is done to the core queries and routines used to take up to 25% of the total effort, now, it is cheaper to spend on more hardware (typical 10GB of memory on a 2 CPU PA-RISC 4 years ago)). Lately, we can't get enough RAM to get to the 10% rule of thumb desired by database configurators. You could ask Tony which chipsets have the most memory and which have the most per CPU.
This is the real reason why I suspect that there are less than 2 CPUs per bus for the average Intel server shipments to VARs. You can see why clusters (MPP) are the fastest on the TPC-C and TPC-D benchmarks and why CPU speed does not correlate to TPCm much. Ask any technical person what happens to his app's performance when the working set is mostly out on disk versus completely in memory.
We argue here whether 256K or 2M is better for a L2 cache even with lower speed. And that is for a 1 magnitude difference in latency. Well the difference is even more pronounced for disk. To combat this there are caches in main memory, on the controllers and on the drives. And here, 10 18GB 15K disks are better than 1 180 7K disk.
I hope this clears up your confusion.
Pete
PS, the above companies are not "two men and a dog". Typical big customer warehousing systems (fortune 500) are in the $10 to $100 million range automation included. Spare me the dumb comments. Ford, GE, HP, IBM, P&G, Walmart and such are not stupid. |