SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: combjelly who wrote (20629)11/27/2000 10:44:16 AM
From: fyodor_Read Replies (1) | Respond to of 275872
 
<combjelly: It is my perception that to truly take advantage of SSE2 in anything but a very samll subset of programs, there will be a need to make fundamental changes to the way programs are structured.>

Well, SSE is probably more widely applicable than SSE2. Remember, SSE2 is double precision fp and some 128bit integer (a replacement for MMX, would be my interpretation). SSE can dramatically speed up single precision fp, which is very widely used and often has a relatively high degree of (IL-)parallelism. The (software) prefetch commands are also "SSE1".

What we need are basically two things:

- Compilers need to be a heck of a lot smarter.
- The standard libraries (both public and company-specific) need to be upgraded.

The latter of the two is happening... slowly, but it is happening. As for smarter compilers... well, I'm sorry to say that's in the hands of Microsoft.

I do wish someone else would come along and make some really great compilers...

-fyo



To: combjelly who wrote (20629)11/27/2000 7:14:06 PM
From: Steve PorterRead Replies (1) | Respond to of 275872
 
combjelly,

I think (with all due respect) you are missing a couple of keys points here.. first sse2 unlike mmx, sse, and 3dnow! doesn't require the loss of any precision, so a semi-intelligent compiler can insert them anywhere it see code like:

double a, b, c, d;

a = b= 3.14159;
c = a * 6;
d = b * 7;

a * 6 and b * 7 can be done in parallel.. a no brainer for a compiler..

Additionally the P4 will perform better with the next rev of MSVC as it will be able to better schedule instructions for the P4.. This can get you upto and including 20% more performance.. just look at the athlon with teh new athlon specific flask optimizations.. a 550mhz K7 is turning 7.8fps (or something).

Steve