SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Politics : Formerly About Advanced Micro Devices

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: Ali Chen who wrote (93563)2/16/2000 12:41:00 PM
From: kash johal   of 1574717
 
Ali,

Some more willy stuff:
This is from jc-s thread:

Willy : Instruction latencies.. FPU.. and more thoughts

Posted by Remnant on Tuesday, 15 February 2000, at 8:25 p.m.

(here : developer.intel.com

In the code optimization section, the following things stood out : INSTRUCTION LATENCIES! as you can imagine from such a huge pipeline, these are much increased. Check out these examples they gave :

shift instructions were 1-cycle on the p6 core. On Wilamette, they are 2-4 cycle.

integer and floating point multiply : was 4cycles on the P6 family, on Wilamette is "as many as 10" cycles.

The FXCH instruction, used to optimize P6 floating point code, is no longer a nearly free instruction. It now has penalties involved, and "should be avoided in Wilamette family processors"

Latencies always go up with a longer pipeline, but these are significant increases. The real kicker is the FXCH, which is currently used in optimized FPU code to achieve the highest speed on P2/3 CPUs. If this instruction has penalties on the wilamette, this is bad news for all existing code.

Also, in the whole datasheet I saw no mention of any improvements made to the p3 FPU core other than a load/save state operand. It seems the Intel is betting the whole farm on the extended SSE instructions. This is both beneficial and bad.

Benefits :
potentially faster
simpler for them to design than a new x87 fpu.
with the new compiler out, people WILL start using SSE more.

Cons :
need to be optimized for SSE to get anything outta it.
With double-precision, you can only work on 2 64bit floats at once. Since I see no mention of a 2nd SSE pipeline, I'm not sure if this will be significantly faster than an advanced x87 fpu (ie Athlon)
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext