SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: jcholewa who wrote (19218)11/15/2000 2:14:25 PM
From: Charles RRead Replies (2) | Respond to of 275872
 
<You said SSE2 optimizations. That means either double precision vector floating point (which doesn't usually apply to the games) or vector integer, which is unimpressive in the P4 (it's equal to regular MMX, but only if you make your code more explicitly parallel).>

I do not know how SSE2 is being used in game optimization.

Kap seems to have answered how the SpecFP optimizations are being done. As you see it has very little to do with prefetch and a lot to do with SSE2 optimizations. If Kap is right and there is only a hardware prefetch, I wouldn't be surprised if the prefetch gains amount to much. (Hardware prefetch from experience can even lead to performance decrease.)

<Please ask them for details. They may be speaking about CAD-type operations, which could be immensely assisted by SSE2's 4-way double precision vector floating-point capability.>

I will try to see if I can get any meaningful details.



To: jcholewa who wrote (19218)11/15/2000 2:26:58 PM
From: kapkan4uRead Replies (2) | Respond to of 275872
 
<Please ask them for details. They may be speaking about CAD-type operations, which could be immensely assisted by SSE2's 4-way double precision vector floating-point capability.>

First, packed SSE2 IS 2-way double precision not 4-way.

Second, you forget that SSE/SSE2 have both scalar and packed versions of the same instructions. Scalar SSE/SSE2 can be used anywhere to replace x87 arithmetic. The advantage is to use a flat register file instead of the brain dead x87 stack.

Kap