SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Intel Corporation (INTC)
INTC 38.16+2.5%Nov 7 3:59 PM EST

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: fp_scientist who wrote (133106)4/22/2001 7:06:41 PM
From: fyodor_  Read Replies (1) of 186894
 
fp_scientist: In the last 3 years or so, a revolution in scientific computing started when people like myself started building clusters of PCs for floating-point numerically intensive computing.

I know exactly what you mean. We use a small cluster of dual P2s for our most demanding simulations (well, we do buy time on a couple of super computers, but not for any of the stuff I do). The IT guys are currently evaluating (slowly, everything has been delayed a couple of quarters) new platforms for a successor. The main contender was actually Willamette, but it turned out that it did rather poorly on the specific "benchmarks" our programming guru had chosen as representative of what we do (which is a lot of fairly different stuff - from neural sims to more quantum chem-ish models). The selected benchmarks were hand optimized in asm for Willy (using plenty of SSE2), but it scored less than the P6-optimized version run on an Athlon (although I should mention that our programming guru said the gain would likely be minimal even if the benchmarks were hand tuned to the Athlon). I was pretty disappointed with that, which is one of the reasons I'm sometimes a bit harsh on the P4... It still provides the best bandwidth (by far) of anything remotely reasonably priced - I'm just annoyed that Intel screwed the FP unit over. Heck, even using SSE2, the peak fp performance is equal to that of the Athlon.

In my case, it all depends whether SSE2 can be used efficiently in libraries for matrix multiplication, and things like that.

I would really recommend writing the basic matrix operations in asm (using SSE2, of course). The operations are pretty basic and the reuseability is great, making it an economically worth-while endeavor.

-fyo
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext