Charles, <the race is lost at least until Wilamette.> I have some doubts here. We know that the frequency "turbocharging" is done by superpipelining, or increasing the number of CPU pipeline stages. However, there is a limit: you can't make stages shorter than certain amount of gates deep - you still need to perform some "atomic" logic operations that cannot be sup-pipelined any more. You can't break it down to one-two NAND gates. Therefore, for a certain process technology stage, the frequency must be limited by these "atomic" functions, plus interconnect delays on long routes. In addition, longer pipelines have adverse effect on general code performance.
Are there any indications that there is any significant room for super-super-pipelining left for Williamette? Why we think that Williamette can break laws of digital logic and physics?
If not, and if the optimal pipeline length for x86 architecture was found to be about 10-12-15 stages, there is little reason to believe that any other re-shuffling or partitioning of prefetch, decode, execute, and retire functions will result in much faster and better performing x86 CPU than AMD Athlon or Intel CuMine.
Any thoughts? (ignorant morons like Elmer, Paul, and Youseless are asked to not bother). |