but the contribution will be CPU cycles, and in single digits, as far a CPU cycles are concerned.
That is interesting about helping loads from cache, but reordering can move a load MANY instructions back, as I understand it, which could be 10-20 cycles or so, let's say.
Also, K8's latency to main memory advantage is projected to shrink:
With the numbers available to us now, we have reason to believe that the Athlon 64 X2's latency advantage will shrink to only 15 to 20%. For comparison, the memory subsystem of the Pentium 4 was almost twice as slow as the Athlon 64 (80-90 ns versus 45-50 ns).
So that would put Core2 at, say, 60ns vs 50ns for K8, for main memory latency.
Converting to cycles at 2.6GHz: 1 cycle = .38 ns, so:
Core2: 158 cycles K8: 132 cycles
If load reordering can save 10-20 cycles, that's a large portion of the difference between the two systems even during cache misses.
Now, I don't know what the average reordering win is, and you may be right anyhow, that the biggest contribution comes from cache hit reordering. |