SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: pgerassi who wrote (52889)8/28/2001 10:14:42 PM
From: wanna_bmwRead Replies (1) | Respond to of 275872
 
Pete, Re: "Better check on P4 micro architecture before putting your foot in your mouth."

Better check whose foot is in whose mouth. The Pentium 4 can do shifts, just not with the double pumped ALUs. They are also fully pipelined. I'll give you a second so you can get your foot out before I present you the proof.

developer.intel.com

Page 294 - SSE-2 integer shifts. Take 2 cycles to complete, and 2 cycle throughput (not fully pipelined).
Page 299 - MMX integer shifts. Takes 2 cycles to complete, and single cycle throughput (fully pipelined).
Page 301 - x86 General purpose shift instructions (SAL/SAR/SHL/SHR). Takes 4 cycles to complete, and single cycle throughput (fully pipelined).

Obviously setting up simultaneous shifts using SSE-2 would result in the highest performance. The Athlon may have a smaller 1-cycle latency with their general purpose shifts, but they sure can't line up 16 8-bit data bytes and shift all of them simultaneously in 2 clock cycles. Try again, Pete. I should have left you on ignore.

wanna_bmw