To: Saturn V who wrote (5158 ) 8/15/2000 6:19:25 PM From: pgerassi Read Replies (1) | Respond to of 275872 Dear Saturn: Re: 286 I suggest you look at the documentation again. The 286 is not pipelined. There is no branch misprediction penalty. There is a penalty for taking a branch as the 286 needed to generate an address depending on the type of address used. That was documented as 5 cycles if branch not taken, 8 cycles for immediate, 10 cycles for near offset, 12 cycles for far offset, 10 cycles for register, etc. Notice, no changes due to whether the branch is taken once or five hundred times in a row. Now on a P3, if it does not correctly predict a branch all speculative instructions executed after the wrong branch was assumed must be flushed. It is these instructions that are called the branch misprediction penalty. Now this is typically the length of the pipe between instruction fetch and conditional test. It is this section of the pipe that needs to be flushed so that the correct instructions can begin to be processed. Note, on a vector jump (ie: jmp *0x0(,%eax,4)), it is almost impossible to predict because there are millions of possible jump locations (I believe P3 and all current x86 CPUs simply assume a jump of zero (no jump)) so a branch mispredict penalty is almost certain. Now I do accept that a pipelined processor may not have a branch mispredict penalty if it does not speculate after branch instructions. In the original RISC systems like SPARC, the instruction after a branch is always executed whether the branch is taken or not. This allows for no branch misprediction penalty in a two stage pipeline of fetch, execute. But, this is an advantage for very short pipelines only. Pipelining only makes sense once the number of instructions executed per clock rises near 1. All RISC based CPUs executed at least one instruction per clock (register loads and stores to memory required an extra cycle due to FSB limitations). They were superscalar, when they could execute more than one instruction per cycle. Since there were very few CPUs that took more than two cycles to execute consecutive instructions (the FPU on the K6 was not considered to be pipelined even though it took dozens of cycles to execute a FPU instruction (like ATAN) but could take a new FPU instruction every other cycle), any CPU that regularly took more than two cycles to execute a simple instruction like an or, is considered to be not pipelined. A superscaled CPU might not be pipelined, but I do not know of any. Thus the 286, which took 2 or 3 cycles to do an "or ah,al", could not be considered pipelined. It averaged about 4 to 8 cycles per instruction whereas the 8086 took about 10 cycles per instruction. The 386 took about 3 cycles and the 486 took 1 cycle per instruction. Thus, the 486 is the first x86 CPU that could have been pipelined. The Pentium must have been pipelined because it performed branch prediction. The main reason why the P2 had a higher IPC than the Pentium was that it used out of order execution in conjunction with a larger superscalar superpipelined RISC core. The ideal pipeline processor with prefect branch prediction will only execute one instruction per cycle, an IPC of 1. It takes superscalar processing (multiple pipelines, coprocessors, and/or execution units) to get an IPC above 1. Pete