SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: fastpathguru who wrote (225773)2/12/2007 3:11:29 PM
From: Ali ChenRespond to of 275872
 
fpg, "Cache lines are 64B(ytes) (not b(its)). I don't think the wider L1-core bus will cause any cache thrashing.

And yes, it won't affect the characteristics of memory-bound apps too much, but for loops running out of L1, especially those using SSE registers, it should help quite a bit. Or to put it another way, the doubled FP unit would probably be starved without a complementary increase in bandwidth to L1."


First, I didn't discuss performance issues with L1, I was concerned about lack development to deal with L2/L3 misses.

Second, the memory width is still 8bytes=64 bits per DIMM. Delivery of critical data is critical for CPU performance (pardon my pun), therefore delivering 64 or 128 bits at once does not improve it much, just as I said. However, if the dual-DIMM controller is smart enough to configure the burst length to be half of a single-DIMM setup, then the cache thrashing is non-issue, and it may somewhat reduce overall memory latency, that's true.

Third, you sound like a K10 performance architect. Unfortunately, from the information floating around (sysmanager demo vs. "simulations" vs. lack of real benchmark numbers), I see that the reality doesn't want to cooperate with assumptions made by AMD design management team. Welcome to real world, fpg :-(

Cheers,

- Ali