SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD)
AMD 214.87-0.1%3:59 PM EST

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: fyodor_ who wrote (21751)12/7/2000 12:49:59 AM
From: PetzRead Replies (1) of 275872
 
fyo, thanks for digging up that P4 info on DP multiply/add latency and throughput.

I was actually surprised that the SSE2 multiply (MULPD) has a throughput of 2 64 bit multiplies in 2 clock cycles. That is twice as good as the P3.

Its really peculiar that the x87 64 bit multiply still takes two cycles. Perhaps they wanted SSE2 performance to be clearly superior to x87 performance to encourage adoption by programmers, thereby putting Athlon at a disadvantage.

Possibly the dependencies between FADD and FMUL have been lessened in x87 mode. For example, the P3 could not do two FADD's and one FMUL in two clock cycles because of shared hardware between the FADD and FMUL units. This may have been eliminated in P4, speeding up x87 mode for most code, which usually has more FADD's than FMUL's.

Petz
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext