To: Saturn V who wrote (5141 ) 8/15/2000 4:33:57 PM From: pgerassi Read Replies (1) | Respond to of 275872 Dear Saturn: Re: 286 I repeat! The 286 (80286) is not pipelined. Every instruction on it was predictable in that given the data one could calculate exactly how many cycles it took to do a job. Loop unrolling executed faster by eliminating the jumps and the cycles it took to do them. On the 8086 (or 8088) for that matter, the same thing could be done and was. For their to be a branch mispredict penalty, there must be a difference between taking a jump at one time and taking the same jump at another time. On the 286, there was no time that a jump took longer or shorter than any other time it was taken. This contradicts your assertion that there was a branch mispredict penalty for the 286. I believe you will not find a misprediction penalty in the clock cycle range for any x86 Intel CPU until the 486 at the earliest and possibly as late as the Pentium Pro. A conditional jump took one amount of cycles when the CPU jumped to target and a smaller amount when it did not. This is due to the extra cycles required to compute the new address. The FPU instructions vary in clock cycles based on the data simply due to the effect of an operation like division can be terminated when the remainder becomes zero. But given the data, you can determine exactly how many cycles it would take to complete the operation. Data flow determined if additional wait instructions were required (the CPU is idle until the data becomes available). Remember this simple rule of thumb, "There is no branch misprediction penalty if there is no time dependency on how long it takes to branch". Exceptions are cache waits (a problem found in 386s), DRAM refresh, instruction interactions (even the 8086 had these as you could not issue two x87 instructions in a row because the second one has to wait for the first to finish (some took 1000+ cycles to complete (think ATAN))), and interruptions both soft and hard. Loop unrolling is a way to eliminate entirely n - 1 jumps for a loop unrolled n times. The penalty is in the larger amount of code required. In certain x86 processors, some of the loop instructions could be speeded up by simply unrolling the loop and calculating the jump into the unrolled sequence as only one jump is made versus n + 1 tests (eliminating branches to take or mispredict). Sometimes, you did not have the memory to spare (you may not remember (or even be born) when memory was thousands of dollars a kilobyte (in the embedded world where saving 1 cent can return ten million dollars (1 billion watches))). Pete