Sorry, usually I just lurk, but...
/rant on
Let's take a look at that HP9000. Here's a minimal configuration for an HP9000/N4000 with a 440MHz PA-RISC, according to HP's list prices:
HP9000 N-Class Enterprise Server $23,900.00 440 MHz PA-RISC 8500 CPU 1.5MB cache $24,000.00 Processor Support Module $ 1,800.00 Memory Carrier Board $ 5,000.00 512MB High Density SyncDRAM Memory Mod $ 4,815.00 9GB HotPlug Ultra2 SCSI Low Prof Disk $ 1,400.00 Dual Port FWD SCSI (PCI Bus) adapter $ 1,400.00 CD-ROM (disk only) $ 520.00 Factory integrated $ 195.00 N-Class rack mount kit for HP Rack Sys $ 400.00 ---------- $63,430.00
(this is from the configurator on HP's website, and the FWD SCSI adapter was put in there by some rule apparantly; I'd selected a single port single ended adapter.)
Yes, that's $24,000 for a single CPU. The N-class can hold up to eight of them, and there ain't no buy-seven-get-one- free deals.
Now, the system that AMD benchmarked used perfectly standard PC parts. It used a Western Digital EIDE disk drive, for God's sake. It had a single 128MB PC100 DIMM, so there couldn't have even been any interleaving. It ran Microsoft operating systems. There couldn't have been more than $2500 worth of hardware in that box and still it posted respectable SPEC numbers, even SPEC base numbers.
Now where I work, my users crunch numbers. They crunch lots and lots of numbers, and then come back to crunch some more. I think I could buy them ten times the number of computers they have now (which is quite a lot already) and they would *still* have numbers left to crunch. Now, they have so many numbers to crunch that there is simply no hope that they would ever be crunched on a single machine. We use Suns for most of the crunching (no jokes here, please -- we've used them for 15 years and overall the ride's been good) and even if they had a maxed-out Starfire it wouldn't do any good. Plus, they tend to run extremely anti-social jobs from an SMP/multitasking perspective.
Ever see what a big Monte Carlo job can do (I pick Monte Carlo as an easy-to-explain example, not because it is the only thing we do)? A Monte Carlo analysis is where you have an equation, and you, for example, want to integrate it over some region. So you conceptualize a box around the function in that region, and randomly pick a bunch of data points, calculating for each if it falls above the function or below. After doing this a few million times, you figure out the percentage that fall below the function, and you have a reasonable estimate of your integral. Now, the main point here is that, other than your equation (which, if you're doing this, is probably empirically derived and quite complex), all your data is synthetic. There is no reason to ever have to go out to disk. No reason to hit the network, or the display, or anything like that. All the code ever does is bang away at the ALU and the memory, relentlessly, over and over again. This kind of job is absolute hell on a multi-tasking system because it never, ever, gives up a time slice unless forced to. Most well-behaved jobs will at least take themselves out on occasion to do a disk I/O or something. But not a Monte Carlo job. In addition, if the calculations and the intermediate storage (the function, for example, could be stated as, say, a 1000x1000 matrix of functions that must be repeatedly applied to random vector data) are large, this kind of job can just kill whatever memory bandwidth your machine has. It can blow out any L2 cache ever made --all locality of reference assumptions will be toast. Belive it or not, a single, well-designed Monte Carlo job can reduce that $63,000 HP9000 to nothing. That one ill-behaved job can suck up the entire machine and make it unusable for anything else. And, chances are, if it's a big enough job, it could take maybe a month to run though to completion.
One would have to be a moron to waste a month's time on a $63,000 HP9000 on a single Monte Carlo run. This is for two reasons. One is that, by virtue of the fact that such a job never hits the disk, the bulk of the advantage of something like the N-class -- the advantage being in the I/O -- is totally wasted. Second is that if you think about what I just described, it doesn't make a bit of difference if your job is running on one computer or two; it's just a long sequence of calculations on random data (carefully constructed, uniformly-distributed random data though it might be). So, why not just run half the calculations on one machine, and the other half on a second machine? Why not indeed.
Mind you, we can't afford the high-end Suns, either. We actually integrate SPARCengine boards in-house into rack- mount cabinets. We can build a uniprocessor UltraSPARC system into a 2U chassis for less than $3K (full system, including disk, memory, Solaris and network interface). A Quad-processor AXmp system with 300MHz UltraSPARCs, 1GB of memory, mirrored 18GB system disks, and 100GB of RAID-5 storage out a differential UltraSCSI port on the back end costs us about $30K ($20K system, $10K RAID). We build them this way in large part because it is the only way to afford that much computational power for our current software base.
My budget is fully allocated (and largely spent) for '99, but I'll tell you, come '00, I'll be buying some Athlons to put in racks; I'll load them up with Linux and we'll take them out for a spin.
But I would *never* do that with an HP, even if it was twice as fast as it is now. They're just too damned expensive, and in that sense I think that they're actually irrelevant to this discussion. I'll take 25 Athlon systems (assuming they're available as described, which is really the point, no?) over a single HP9000/N4000 anyday, and that would be true even if the HP had a 10-1 performance advantage over the Athlon. This is a bragging rights thing that doesn't have anything to do with real buyers buying real computer hardware.
FWIW, we also integrate our own PCs, and we've used AMD chips in our basic PCs since the K5/133. (we use Dual PIIs in our high end right now). We have no regrets.
/rant off.
--Bob |