Kap's program.
Let's assume that Kap's program shows what he claims it shows. I am not totally convinced because there are some variables that aren't controlled for, but I'll admit that it does lend some support to his conclusion. But it isn't fraud on the part of Intel, although it might be a PR nightmare.
The whole point of the trace cache is to decouple the complex, and potentially slow, x86 decode logic from the execution pipeline. Call it tacit recognition that x86 should have died a long time ago, and that RISC is really a better way to go. Now the downside is that the trace cache has got to be large enough to mean that the decoder is hit only rarely. The problem here is that for every other x86, things like loop unrolling was a good technique to increase performance in critical code because it would get around the problem with branch prediction. In other words, straight through execution runs better on everything that isn't a P4, but P4s like loops. So some optimizations for the P4 are pathological for everything else, and vice versa.
But this tells us nothing new, and it doesn't constitute fraud on the part of Intel. It is a clever way to eliminate a bottleneck in performance. It does depend on the trace cache being large enough to have a useful number of instructions, and it likely will complicate a SMT implementation. But, it is a valid, non-fraudulent design technique, as long as the plusses and minusses are known. I suspect that at some point AMD will implement something that is similar to this, in fact, there may be something that looks superficially like this in the Hammers. Consider that there is nothing keeping a designer from including decoders for different instruction sets that can be changed on the fly. Well, not on an instruction by instruction basis, but on a code segment by code segment basis, much like the way the Hammers do. Given enough transistors, there is no reason why you couldn't have decoders for x86, x86-64, PowerPC, Alpha and others. It would solve the legacy software problem, big time. |