To: Ali Chen who wrote (227273 ) 3/4/2007 12:49:09 PM From: DDB_WO Respond to of 275872 Ali - The attempt to reduce main memory latency is indeed a nobel move. However, as I said many times, the advantage of AMD approach to memory handling is highly exaggerated. You need to look at the overall effect of whole memory subsystem, including the art of hardware prefetch, quality of software prefetch, cache miss rates, and FSB/memory penalties. Well, already the first Opterons had a measurable advantage in memory latency compared to K7. And this was likely one of the main reasons, that these CPUs could still compete, even with much lower L1/L2 bandwidth, L2 size, not so good prefetchers (first update came with rev. E as announced by McGrath in his Stanford talk), inefficient SSE implementation etc. So I think, it's not correct to say, that the IMC had no positive effect at all. But it's ok to assume, like you did, that it's effect is not the maximum of what could have been achieved. It's always the same: the companies (esp. the smaller ones) have limited R&D ressources and have to decide, which cards to play. One company choose to first improve on RAM (RDRAM), NB, FSB, prefetching and L2 cache size, while the other only went with RAM (DDR) and NB (integrating it into the CPU) first while also increasing the L2 cache size. It's not a guessing game for them since hardware changes faster than the huge amount of software out there.If you look even at somewhat old data,home.austin.rr.com the bottom line is that AMD already has 50-60% disadvantage in overall memory waste traffic even as compared to old Pentium D, with older chipset and slower memory, compare line (4) with lines (6) or (7). It translates into 25-30% of loss in overall performance. The data are for SPEC2000, which has smaller data sets than newer SPEC2006, so the gap must be more pronounced in 2006. But even if the AMD statement is true (100ns vs.70/55ns), which I doubt (their base might be quite obsolete), their latency effort is in right direction. An interesting chart. Clearly shows, that K8 would need to increase clock faster to keep up with SC Prescott having 2 MB L2. At ~6 GHz they would have about the same SPECfp_base2k performance. There is no Pentium D in the chart (typo?). This 25-30% loss would however not happen in reality as long as we are that far away from THz frequencies. However the steepness and position of the 2M Prescott line vs. that of the (what I assume) 1M Prescott [3] line shows some effect of the L2 alone. The FX is the fastest x86 CPU there, but maybe also thanks to unregistered DDR RAM. However with 2 samples it's already unreliable to extrapolate this way. With one sample it's obviously impossible. I'm just wondering about this 55ns number, since latencies of current K8s are already that low. IIRC then P4 had ~100+ latencies in Rightmark Memory Analyser results, while Dothan and Yonah already are in ~50-70ns regions for random accesses. There are also higher numbers for K8 and Yonah depending on the size of the arrays. Having to close and open DRAM pages is really adding a lot to latency. But in Barcelona a direct prefetch to L1 would save 16 or so cycles vs. K8, the improved prefetching itself plus the IMC prefetcher, optimized page conflict handling and other improvements in the memory subsystem will have effects, which are rather difficult to predict. You could browse AMD's patents in this regard. But they also don't tell every detail. Regards, Matt