Joe, I reread the interview of Richard Brown from VIA, the one posted by WBMW. I think Brown has it backwards, probably as a result of pumping up VIA's own products.
The CPU is always sensitive to latency, much more so than any other component in the system. Putting the memory controller right on the CPU die helps to lower that latency. I believe this is where AMD will experience the majority of their estimated performance gains with Hammer.
Meanwhile, moving the memory controller away from AGP and other I/O doesn't affect performance all that much, because I/O is more concerned with bandwidth, not latency. The exception would be nForce-like integrated graphics, but integrated graphics are never meant to have cutting-edge performance, not even nForce.
All of the above is true for single-processor systems. However, when you move to multiprocessor systems, it's a slightly different ballgame. The good news about the Hammer design is that memory bandwidth scales with the number of CPUs. The bad news is that the Hammer approach is much tougher to implement than a shared bus architecture like Pentium 4, Itanium, or McKinley. Not only that, but without ccNUMA support on your OS, average latency will suffer because only a fraction of your memory accesses can be satisfied by the on-die memory controller. ccNUMA support is usually a feature exclusive to high-end server OSs.
Tenchusatsu |