wbmw,
If that were the case, Intel would have put everything into creating a solution to this. Increasing the performance of any one area is solvable given enough time and effort. Double the speed, and the problem is solved.
Not if the decoder is the single bottleneck limiting the clock speeds.
Rather, I think the penalty in IPC comes fundamentally from the longer pipeline, and the way data aligns within the CPU.
Agreed. And of course the TC miss lengthens the pipeline by some 6 stages to 26.
I see that Kap is already trying to prove that his engineered application is a "popular" piece of code to use. That's a huge load of sh!t, and I could spend a whole new discussion telling you why.
I think he is trying to prove that speed of P4 decode is 1/2 of the nominal clock. Why do you keep changing the subject?
Fortunately, optimizing for the architecture is also popular. Somehow, I think the future is more geared towards optimizing than finding faults - what do you think?
I am a programmer, and I have never spent any time optimizing my code for any CPU. I take the output of the compiler as is. It runs just fine on any CPU from Pentium, K6, Athlons etc. If the performance is not good, I re-examine logic and improve it, and I get equal improvement across the range of CPUs.
I don't see myself spending any of my time figuring out pathologies of P4 on code generated by standard compilers - Visual Studio suite and the popular runtime environments - Java, VB, CLR.
Joe |