SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: pgerassi who wrote (5183)8/15/2000 9:01:52 PM
From: Saturn VRead Replies (1) | Respond to of 275872
 
Dear Pete,
I agree that "out of order execution" improved the IPC of the Pentium II.

You and Scumbria are making a big deal of the branch misprediction. Most CPU time is spent in some loop or the other. So branches prediction will work pretty well in any piece of repititive code.(Correct prediction 95-98%). The only problem is a branch encountered for the first time !(Correct prediction of 50%). How oftem is that ? Obviously it will depend upon the benchmark ,ie how much time is spent in "repitive code" vs "non-repitive code".

In most CPU hungry applications repitive code dominates. Examples are graphics, image processing, signal processing etc. The non-repitive code is found typically in the prolog and the epilog of the application, and the main act of most computer programs are loops, loops and loops.

So any branch mispredict penalty if any, would be washed away by the other enhancements of the P4 family, just as it happened in the Pentium II.

We can argue ad infinitum on this issue.

I suggest we revisit this issue when the performance metrics of the P4 finally become public.



To: pgerassi who wrote (5183)8/15/2000 9:21:56 PM
From: jcholewaRead Replies (5) | Respond to of 275872
 
Re: IPC gains P2 vs Pentium

> Out of order speculative execution is what makes P2 faster than Pentium in addition to superscalar RISC based CPU IN
> SPITE of longer pipeline. Had the P2 just been a longer pipeline, it would not have had more IPC. Each stage has to
> be able to execute more than one instruction at a time for there to be an IPC greater than 1. Pipelining only allows for
> the maximal use of a resource if and only if that resource is the bottleneck of the pipe. Any part that is not the bottleneck
> will not be fully utilized. Thus a single pipeline can only execute one instruction per advance of the pipe. That is what
> IPC, instructions per clock (advance), means. Thus to get a higher than one instruction per clock requires more than one
> pipe. Athlon has 3 decoder pipes plus 9 execution pipes. P2 had three execute pipes and two decode pipes.

I know this is probably somewhat trivial, but shouldn't the fact that the PII had faster level two cache (PII-266's L2 had 33% higher throughput than P55c-266's L2, and the fact that the L2 was on the module instead of on the motherboard might have given it an even greater latency assist), and that the GTL+ bus supports (I'm told) features which make it inherently faster than S7 boards (I'm slightly talking out my ass here, but a friendly Cyrix fellow once suggested some kind of term like "pipelined transactions", and I keep remembering about that), add to the performance at a given clock?

> Now P4 does not have many more pipes than P3. In fact it
> has less FPU pipes.

P6 had ( azillionmonkeys.com -- if I'm understanding it correctly, and there's no guarantee of that! ) one main pipe for FADD, FMUL, and stuff like that, and a pipe of lesser note for FXCH. P4 has ( watch.impress.co.jp -- I had a better image of this, apologies that I am unable to produce it) one pipe for FADD/FMUL/Fetc., and one pipe for FSTORE and similar "maintenance" tasks. Isn't this overall correct? Wouldn't, then, it be more accurate to say that the P4 and P6 have the same number of pipelines (two)?

Of course, if the P6 can run an FSTORE or fp move from elsewhere on the cpu at the same time it runs that FADD or FMUL, then what I'm writing here is totally moot. Doh. :)

Pete: You seem to be an intelligent fellow. I am amazed at how many intelligent fellows are saying stuff along the lines of PIII-1000 ~= P4-1400. I choose to disbelieve this for the sake of my own sanity (and because I have a little more faith in Intel's engineers, if not their x86 management subgroup), but you should probably know that your calculations here are held by others as well.

-JC