To: fp_scientist who wrote (133106 ) 4/22/2001 7:06:41 PM From: fyodor_ Read Replies (1) | Respond to of 186894 fp_scientist: In the last 3 years or so, a revolution in scientific computing started when people like myself started building clusters of PCs for floating-point numerically intensive computing. I know exactly what you mean. We use a small cluster of dual P2s for our most demanding simulations (well, we do buy time on a couple of super computers, but not for any of the stuff I do). The IT guys are currently evaluating (slowly, everything has been delayed a couple of quarters) new platforms for a successor. The main contender was actually Willamette, but it turned out that it did rather poorly on the specific "benchmarks" our programming guru had chosen as representative of what we do (which is a lot of fairly different stuff - from neural sims to more quantum chem-ish models). The selected benchmarks were hand optimized in asm for Willy (using plenty of SSE2), but it scored less than the P6-optimized version run on an Athlon (although I should mention that our programming guru said the gain would likely be minimal even if the benchmarks were hand tuned to the Athlon). I was pretty disappointed with that, which is one of the reasons I'm sometimes a bit harsh on the P4... It still provides the best bandwidth (by far) of anything remotely reasonably priced - I'm just annoyed that Intel screwed the FP unit over. Heck, even using SSE2, the peak fp performance is equal to that of the Athlon.In my case, it all depends whether SSE2 can be used efficiently in libraries for matrix multiplication, and things like that. I would really recommend writing the basic matrix operations in asm (using SSE2, of course). The operations are pretty basic and the reuseability is great, making it an economically worth-while endeavor. -fyo