SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: Petz who wrote (19866)11/20/2000 5:13:16 PM
From: pgerassiRespond to of 275872
 
Dear John:

Just as much optimized code substitutes SHR instead of /2 as code uses SHL instead of x2. In both cases, shifts were quicker than using a multiply/divide unit both in code space and time. Add to itself was usually slower than shift left on most CPUs so that substitute was not done much but, the shift right for divide does not have a usable substitute (since on P4 all shifts and rotates are much longer (missing barrel shifter is the real problem with this on P4)) so common assembly tricks are slower on P4 than any current mainstream CPU. This also means most optimizing compilers will generate suboptimal code for P4 (hey Intel was asking for it when they deleted a barrel shifter (Why???? Just DUMB I guess) on P4).

Pete



To: Petz who wrote (19866)11/20/2000 8:04:29 PM
From: fyodor_Respond to of 275872
 
<Petz: you are correct about SHL being more useful than ROL, but I'm sure both have the high latency problem on the P4.>

Yes, both have a latency of 4, compared to 1 on the Athlon (and PIII?).

-fyo