To: dougSF30 who wrote (196987 ) 5/16/2006 12:42:07 AM From: Joe NYC Read Replies (1) | Respond to of 275872 Doug,Allowing loads to move ahead of stores gives a big performance boost. In some snippets of benchmarking code, Intel saw up to a 40% performance boost, solely the result of the more flexible way Loads get reordered. I think we are mixing up concepts that are quite different. The load reordering / memory disambiguation will contribute (I really like this, BTW) but the contribution will be CPU cycles, and in single digits, as far a CPU cycles are concerned. We were talking about main memory latency, and here we are talking about 10s and 100s of CPU cycles. If AMD latency is, say 80 CPU cycles faster, the memory disambiguation is peanuts. Where the load reordering helps a great deal is that if it cuts down a cycle or 2 off of an average load. That advantage contributes (significantly) to the loads from cache, where the access may go from, for example, 9 cycle average to 7 or 8 cycles average. So you may have 100 loads, of them 97 are from caches, 2 are cache misses. Let's complete a hypothetical scenario. No IMC No IMC IMC No Load Load No Load Reordering Reordering reordering L1-L2 hits 97% * 9 97% * 7 97% * 9 cache misses 3% * 200 3% * 198 3% * 140 -------------------------------------------------- 1,473 1,273 1,293 As you can see, on cache misses, load reordering is basically irrelevant. The example I was talking about where Opteron will have an advantage are apps where the cache hit rate drops significantly below 97% I used in my hypothetical scenario. If it drops to, say 75%, the load reordering is irrelevant. But Intel has other weapons for those scenarios - larger L2 and prefetching, both of which will help in many apps. It's only when Conroe's latency reduction technologies are out of their wits is where Opteron's IMC takes over. Now, obviously, the ideal thing is to have load reordering, great prefetchers and IMC. It would be great if K8L had it, but realistically speaking, it is an extremely long shot to expect it to be in K8L. There may some optimization of the prefetcher and of the memory controller, but implementing the load reordering is probably best left for the next, more major revision (or a whole new core design). Joe