To: Saturn V who wrote (5118 ) 8/15/2000 3:26:37 PM From: pgerassi Read Replies (2) | Respond to of 275872 Dear Saturn: Re: 286 Pipeline Saturn, the 286 was not pipelined. It executed microcode and took many clock cycles to complete one instruction. The 287 was not any faster than an equivalently clocked x87. The reason for the speed of the 286 was in the execution and decode units as much more was done in hardware than in microcode. Pipelining as it is currently used did not start until the 386, and then it was used first in the FPU (387). The 486 or Pentium (I do not remember which) was the first in the x86 line to truly pipeline. The Pentium II, K6, and all subsequent x86 CPUs use RISC cores with hardware decoders instead of true CISC and these all pipelined as it was well understood by this time. Embedded code and device drivers make a lot of use of data driven jump tables (vectors). These type of activities cannot be regularly predicted by most current branch predictors in use. Thus, for these type of code, pipeline stalls are very frequent. Since this type of code is most prevalent in operating systems and servers, shorter pipelines tend to be better at doing this than longer ones. This is why, for heavy multi-user tasks, the K6-3 would outrun higher clocked P2s and P3s. Long pipelines are better when most of the time a CPU is running in an inner loop of some kind. Like doing transformations, FFTs, and other such small code over large data situations. Furthermore, doubling the ALU clock frequency does not speed up the decode pipe. It just makes the pipeline temporarily shorter. This probably provides for a small increment in IPC like 1 to 2% at most, except in certain very constrained situations. Given that the Athlon has an IPC at about 2.1 and I believe its pipeline is shorter by one or two stages than Coppermine at about 1.9. This goes against your initial assumption. Most believe that the Williamette will have a penalty for the longer pipeline, but no one yet has a good idea what it is without simulation or emperical data. The current range of the hit goes from 5% to 50% with the average being around 15% to 25%. This is not bad, if Williamette clocks 40% higher, but is a disaster if it only clocks 15% higher or less (since the overall performance is clock speed times IPC). However, since each doubling of the pipeline has returned less overall speed improvement, sooner or later the overall performance will no longer gain and even start to lose ground. IMHO, this limit may be reached (or exceeded) by the Williamette unless significant improvement is made to either compiler technology, current coding styles, underlying architectures, or some combination of the above. Intel may have gone too far, but one will not know that, if one does not try it. We shall see, when we get samples to test. Pete