To: Jim McMannis who wrote (44911 ) 1/5/1999 7:16:00 PM From: Petz Read Replies (1) | Respond to of 1574679
I think that no matter what the raw double precision FPU speed of a processor is, the 3DNOW! and/or Katmai instructions will execute 2 to 4 times faster, just by adding some logic to do single precision floating point in parallel. So, if the K7 double precision FPU is four times faster than the K6, its 3DNOW implementation will be 8 to 16 times faster. However, at that rate, access to the L1 cache, L2 cache or main memory may well limit the maximum throughput. The reason for the range of "2-4" is that 'add' instructions nominally take half as much time (or half the number of gates) in SP versus DP (single precision, double precision), while 'multiply' instructions take 1/4 as much time. I assume that both AMD and Intel implement 3DNOW-like instructions by borrowing the multipliers from the regular FPU. However, the FPU would be redesigned to either do four 28 bit by 28 bit multiplies in two clock cycles or one 56 bit by 56 bit multiply in two clock cycles. The latter operation can be broken down into 28 by 28 bit multiplies of all combinations of the upper and lower 28 bits of the 56 bit number. These then have to be combined with some added hardware (adders) for double precision results. For single precision results, additional hardware is needed to handle the exponeents of the numbers. Well, if you could follow this, you're GOOD. But in simple terms, 3DNOW is likely to add the same percentage improvement to processing no matter how fast the FPU, until the process becomes data-starved , i.e., limited by cache and memory access. Another fact is that, the longer the latency of the FPU hardware, the more % increase there is in FPU hardware to implement SIMD single precision floating point. Intel's Pentium II FPU has latencies that are more than twice as long as AMD. It will therefore require more of an increase to the FPU to implement Katmai than was required by AMD to implement 3DNOW. Petz