SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Intel Corporation (INTC) -- Ignore unavailable to you. Want to Upgrade?


To: Joe NYC who wrote (150876)12/3/2001 6:52:54 PM
From: wanna_bmw  Read Replies (1) | Respond to of 186894
 
Joe, Re: "But you may be actually right in the way you understood it, which is that the decoder may be the Achilles' heal of P4, the reason the performance does not live up to the clock speed."

If that were the case, Intel would have put everything into creating a solution to this. Increasing the performance of any one area is solvable given enough time and effort. Double the speed, and the problem is solved. But would you be so sophomoric to believe that all the woes of the Pentium 4 can be directed to a single slow unit?

Somehow, I don't think the solution is that simple. I think that the decoder fits just fine with the micro-architecture, as it was always planned. I might believe the caches run at half the speed, but I can't see what would be limiting about the decoder - doesn't make sense.

Rather, I think the penalty in IPC comes fundamentally from the longer pipeline, and the way data aligns within the CPU. You see, applications today are still optimized around the P5 and P6 micro-architectures, which had far smaller cache lines. In order to move things between caches most efficiently, you need things to be cache line size. Not only that, but plenty of legacy applications still work with data that's smaller than 32-bits. When you have data like that going around a CPU's OOOE, it can't align to create any kind of serious throughput. It's like putting a bunch of minivans on a speedway.

There's also the serious penalties from pipeline flushes, cache misses, and branch mispredicts. A 20-stage pipeline will simply have more stages to refill before the pipeline is full again. These aren't completely foreign things - we've known about them a long time, but for some reason, people like Kapkan now chose to ignore them to go hunting for more obscure conclusions.

I see that Kap is already trying to prove that his engineered application is a "popular" piece of code to use. That's a huge load of sh!t, and I could spend a whole new discussion telling you why. Suffice it to say, if you are aiming for hitting a performance penalty, you are likely to get it with the Pentium 4. Fortunately, optimizing for the architecture is also popular. Somehow, I think the future is more geared towards optimizing than finding faults - what do you think?

wbmw