SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: Saturn V who wrote (5118)8/15/2000 2:07:11 PM
From: ScumbriaRespond to of 275872
 
Saturn,

Blah, blah, blah.

Scumbria



To: Saturn V who wrote (5118)8/15/2000 2:17:20 PM
From: Daniel SchuhRespond to of 275872
 
Er, some conditional branches inevitably go 50-50, there's a limit to what branch prediction can buy you. As to Willy at the last IDF, my somewhat jaundiced memory is that the Willy presentation was about as meaningful as Elmer's much ballyhooed OEM gigamine systems also present there, up on stage, all boxed and ready to ship.

I would expect Willy benchmarks to be released soon, in fact, I would have expected them to be leaked already, if the launch was immanent. I imagine there's going to have to be some careful management done there, though. Given the deprecated x87 instructions, Quake is liable to suck. But there's no doubt a new Intel compiler to go with Willy, that will deliver glowing specmarks, if nothing else.

Cheers, Dan.



To: Saturn V who wrote (5118)8/15/2000 3:09:37 PM
From: ScumbriaRead Replies (2) | Respond to of 275872
 
Saturn,

Remember the 80286. It had a much longer longer pipeline than the 8086,

Was the 8086 pipelined?

Scumbria



To: Saturn V who wrote (5118)8/15/2000 3:26:37 PM
From: pgerassiRead Replies (2) | Respond to of 275872
 
Dear Saturn:

Re: 286 Pipeline

Saturn, the 286 was not pipelined. It executed microcode and took many clock cycles to complete one instruction. The 287 was not any faster than an equivalently clocked x87. The reason for the speed of the 286 was in the execution and decode units as much more was done in hardware than in microcode. Pipelining as it is currently used did not start until the 386, and then it was used first in the FPU (387). The 486 or Pentium (I do not remember which) was the first in the x86 line to truly pipeline. The Pentium II, K6, and all subsequent x86 CPUs use RISC cores with hardware decoders instead of true CISC and these all pipelined as it was well understood by this time.

Embedded code and device drivers make a lot of use of data driven jump tables (vectors). These type of activities cannot be regularly predicted by most current branch predictors in use. Thus, for these type of code, pipeline stalls are very frequent. Since this type of code is most prevalent in operating systems and servers, shorter pipelines tend to be better at doing this than longer ones. This is why, for heavy multi-user tasks, the K6-3 would outrun higher clocked P2s and P3s. Long pipelines are better when most of the time a CPU is running in an inner loop of some kind. Like doing transformations, FFTs, and other such small code over large data situations.

Furthermore, doubling the ALU clock frequency does not speed up the decode pipe. It just makes the pipeline temporarily shorter. This probably provides for a small increment in IPC like 1 to 2% at most, except in certain very constrained situations. Given that the Athlon has an IPC at about 2.1 and I believe its pipeline is shorter by one or two stages than Coppermine at about 1.9. This goes against your initial assumption.

Most believe that the Williamette will have a penalty for the longer pipeline, but no one yet has a good idea what it is without simulation or emperical data. The current range of the hit goes from 5% to 50% with the average being around 15% to 25%. This is not bad, if Williamette clocks 40% higher, but is a disaster if it only clocks 15% higher or less (since the overall performance is clock speed times IPC). However, since each doubling of the pipeline has returned less overall speed improvement, sooner or later the overall performance will no longer gain and even start to lose ground. IMHO, this limit may be reached (or exceeded) by the Williamette unless significant improvement is made to either compiler technology, current coding styles, underlying architectures, or some combination of the above.

Intel may have gone too far, but one will not know that, if one does not try it. We shall see, when we get samples to test.

Pete



To: Saturn V who wrote (5118)8/15/2000 7:51:38 PM
From: Dan3Respond to of 275872
 
Re: Remember the 80286. It had a much longer longer pipeline than the 8086, and a much larger branch mispredict penalty than the 8086.

Yes, and I also remember it had full speed DRAM.

Do you know why that is relevant?

Dan