SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: Ali Chen who wrote (225603)2/8/2007 4:06:48 PM
From: fastpathguruRead Replies (1) | Respond to of 275872
 
I am sure the things are much more complicated than that, but the bottom line is that the reality test of embedded controller concept did not reveal tremendous superiority of this approach, and was successfully countered by Intel with bigger caches, hardware prefetches, and processor-specific compiler improvements.

Well, that about sums it up, and I'd agree: It's not "tremendously" superior, just "somewhat" superior... In a 1S system.

The margin of superiority is small enough that "bigger caches, hardware prefetches, and processor-specific compiler optimizations" are enough to overcome it... In a 1S system.

Clearly, the combination of IMC and DCA have demonstrated actual "tremendous superiority" as system scale increases. It takes a lot more (specialized dual-FSB chipset & MB) to overcome the native capabilities of a 2S Opteron system.

In 4S, forget it.

You've got it backwards, IMHO; AMD doesn't need Intel-like caches to catch Intel, Intel needs Intel-like caches to catch AMD (and it only works in small-scale systems, where AMD's IMC provides it the least benefit.)

I think AMD's L3 victim cache will help them in all the right ways, improving single-thread performance where Intel's big shared caches are giving them the most advantage.

And then there's ZRAM hanging out somewhere over the horizon...

fpg



To: Ali Chen who wrote (225603)2/8/2007 6:37:20 PM
From: pgerassiRead Replies (1) | Respond to of 275872
 
Dear Ali:

HT traffic for I/O, at most, uses some XBAR resources. But the XBAR is somewhere between 8 and 32 bytes wide DDR running at CPU clock. So for a 2GHz K8, thats somewhere between 32 to 128GB/s. A few GB/s I/O traffic isn't going to cause it to block memory traffic. With HT 1.0, each HT link only has up to 6.4GB/s of traffic maximum. With DDR2-800, you get 12.8GB/s of memory plus 19.2GB/s for three links for a total of 32GB/s. That's the minimum XBAR throughput. So fully loaded, HT traffic can't reduce memory BW or increase its latency whether its locally or remotely demanded.

With Barcelona, HT 3.0 and DDR3-1200, there is 4 HT links at 20.8GB/s each plus 19.2GB/s for DDR3-1200 for a total of 102.4GB/s. Look for XBAR to be made 32 bytes wide (256 bits) to accommodate all the HT and memory traffic even at 2GHz. It dovetails nicely with 256 bit wide L1 to L2 to L3 widths. Given that XBAR will be 128GB/s, memory BW can be up to DDR3-2800 (44.8GB/s). That is a lot of leeway for future memory speeds.

Pete