pete, <If all the program fits into the L1 that is the best case for the P4 over the P3, not the worst case>
On a % benchmark advantage case, you are right, but I think on a P3 equivalent MHz case, the P4 will look "better" in business benchmarks.
Maybe someone else should check my logic, but here goes. Suppose on a CPU-only benchmark that fits in L1, say, CPUMark98, a P4 1.4 GHz only beats a P3 1 GHz by 11% because its IPC is 20% less (0.8 x 1.4 = 1120 equivalent P3 MHz). So we can say that for this type of benchmark, the P4 1.4 GHz is equivalent to a P3 1.12 GHz, meaning that the 1.13 GHz P3 would beat it.
On a benchmark that includes 2 main other bottlenecks for completion -- memory access time and I/O time, the CPU can predict a branch wrong once in a while and not suffer any consequences, because its waiting for something else anyway. IOW, the more the potential causes of delay in completing a benchmark, the less often any one of them will be the most significant.
So that might make a P4 1.4 GHz appear to be as fast as a P3 1.2 GHz, instead of being as fast as a P3 1.12 GHz CPU.
Petz |