SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: dougSF30 who wrote (196987)5/16/2006 12:42:07 AM
From: Joe NYCRead Replies (1) | Respond to of 275872
 
Doug,

Allowing loads to move ahead of stores gives a big performance boost. In some snippets of benchmarking code, Intel saw up to a 40% performance boost, solely the result of the more flexible way Loads get reordered.

I think we are mixing up concepts that are quite different. The load reordering / memory disambiguation will contribute (I really like this, BTW) but the contribution will be CPU cycles, and in single digits, as far a CPU cycles are concerned.

We were talking about main memory latency, and here we are talking about 10s and 100s of CPU cycles. If AMD latency is, say 80 CPU cycles faster, the memory disambiguation is peanuts.

Where the load reordering helps a great deal is that if it cuts down a cycle or 2 off of an average load. That advantage contributes (significantly) to the loads from cache, where the access may go from, for example, 9 cycle average to 7 or 8 cycles average.

So you may have 100 loads, of them 97 are from caches, 2 are cache misses. Let's complete a hypothetical scenario.

No IMC No IMC IMC
No Load Load No Load
Reordering Reordering reordering
L1-L2 hits 97% * 9 97% * 7 97% * 9
cache misses 3% * 200 3% * 198 3% * 140
--------------------------------------------------
1,473 1,273 1,293


As you can see, on cache misses, load reordering is basically irrelevant. The example I was talking about where Opteron will have an advantage are apps where the cache hit rate drops significantly below 97% I used in my hypothetical scenario. If it drops to, say 75%, the load reordering is irrelevant.

But Intel has other weapons for those scenarios - larger L2 and prefetching, both of which will help in many apps. It's only when Conroe's latency reduction technologies are out of their wits is where Opteron's IMC takes over.

Now, obviously, the ideal thing is to have load reordering, great prefetchers and IMC. It would be great if K8L had it, but realistically speaking, it is an extremely long shot to expect it to be in K8L. There may some optimization of the prefetcher and of the memory controller, but implementing the load reordering is probably best left for the next, more major revision (or a whole new core design).

Joe