SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Intel Corporation (INTC) -- Ignore unavailable to you. Want to Upgrade?


To: fp_scientist who wrote (133106)4/22/2001 1:57:42 PM
From: Scumbria  Read Replies (2) | Respond to of 186894
 
FP,

Have you considered using 64 bit embedded processors, like those from MIPS or Hitachi/ST? The SH-5 has math capabilities similar to SSE, and will cost less than $20 and deliver performance comparable to PC processors. For a few thousand bucks, you could build the most powerful processing engine in the world.

Also, are you familiar with John Bennett's work at Rice in processing clusters?

Scumbria



To: fp_scientist who wrote (133106)4/22/2001 7:06:41 PM
From: fyodor_  Read Replies (1) | Respond to of 186894
 
fp_scientist: In the last 3 years or so, a revolution in scientific computing started when people like myself started building clusters of PCs for floating-point numerically intensive computing.

I know exactly what you mean. We use a small cluster of dual P2s for our most demanding simulations (well, we do buy time on a couple of super computers, but not for any of the stuff I do). The IT guys are currently evaluating (slowly, everything has been delayed a couple of quarters) new platforms for a successor. The main contender was actually Willamette, but it turned out that it did rather poorly on the specific "benchmarks" our programming guru had chosen as representative of what we do (which is a lot of fairly different stuff - from neural sims to more quantum chem-ish models). The selected benchmarks were hand optimized in asm for Willy (using plenty of SSE2), but it scored less than the P6-optimized version run on an Athlon (although I should mention that our programming guru said the gain would likely be minimal even if the benchmarks were hand tuned to the Athlon). I was pretty disappointed with that, which is one of the reasons I'm sometimes a bit harsh on the P4... It still provides the best bandwidth (by far) of anything remotely reasonably priced - I'm just annoyed that Intel screwed the FP unit over. Heck, even using SSE2, the peak fp performance is equal to that of the Athlon.

In my case, it all depends whether SSE2 can be used efficiently in libraries for matrix multiplication, and things like that.

I would really recommend writing the basic matrix operations in asm (using SSE2, of course). The operations are pretty basic and the reuseability is great, making it an economically worth-while endeavor.

-fyo