Silicon Investor (SI) -- The First Internet Community

STOCKTALK

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor. We ask that you disable ad blocking while on Silicon Investor in the best interests of our community. If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.

Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?

To: Saturn V who wrote (5141)	8/15/2000 4:33:57 PM
From: pgerassi	Read Replies (1) \| Respond to of 275872

Dear Saturn: Re: 286 I repeat! The 286 (80286) is not pipelined. Every instruction on it was predictable in that given the data one could calculate exactly how many cycles it took to do a job. Loop unrolling executed faster by eliminating the jumps and the cycles it took to do them. On the 8086 (or 8088) for that matter, the same thing could be done and was. For their to be a branch mispredict penalty, there must be a difference between taking a jump at one time and taking the same jump at another time. On the 286, there was no time that a jump took longer or shorter than any other time it was taken. This contradicts your assertion that there was a branch mispredict penalty for the 286. I believe you will not find a misprediction penalty in the clock cycle range for any x86 Intel CPU until the 486 at the earliest and possibly as late as the Pentium Pro. A conditional jump took one amount of cycles when the CPU jumped to target and a smaller amount when it did not. This is due to the extra cycles required to compute the new address. The FPU instructions vary in clock cycles based on the data simply due to the effect of an operation like division can be terminated when the remainder becomes zero. But given the data, you can determine exactly how many cycles it would take to complete the operation. Data flow determined if additional wait instructions were required (the CPU is idle until the data becomes available). Remember this simple rule of thumb, "There is no branch misprediction penalty if there is no time dependency on how long it takes to branch". Exceptions are cache waits (a problem found in 386s), DRAM refresh, instruction interactions (even the 8086 had these as you could not issue two x87 instructions in a row because the second one has to wait for the first to finish (some took 1000+ cycles to complete (think ATAN))), and interruptions both soft and hard. Loop unrolling is a way to eliminate entirely n - 1 jumps for a loop unrolled n times. The penalty is in the larger amount of code required. In certain x86 processors, some of the loop instructions could be speeded up by simply unrolling the loop and calculating the jump into the unrolled sequence as only one jump is made versus n + 1 tests (eliminating branches to take or mispredict). Sometimes, you did not have the memory to spare (you may not remember (or even be born) when memory was thousands of dollars a kilobyte (in the embedded world where saving 1 cent can return ten million dollars (1 billion watches))). Pete

To: Saturn V who wrote (5141)	8/15/2000 4:56:31 PM
From: Scumbria	Respond to of 275872

Saturn, Ref- <Was the 8086 pipelined? > No ! But by not being pipelined it had no significant branch /or branch misprediction penalty. 80286/80386 were the first 80x86 to have a pipeline, and suffered a branch penalty In a non-pipelined processor, the CPU flushes after each instruction. i.e. every single instruction suffers the equivalent of a worst case branch mispredict. Of course a pipelined processor will run much faster than the non-pipelined 8086! Willy will suffer a much worse branch misprediction penalty than PIII (in terms of cycles.) The IPC will be significantly lower, for all normal benchmarks. Intel will of course keep the clock speeds of PIII and PIV from overlapping, to keep this sort of apples to apples comparison from occuring. Scumbria