Silicon Investor (SI) -- The First Internet Community

STOCKTALK

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor. We ask that you disable ad blocking while on Silicon Investor in the best interests of our community. If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.

Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?

To: Rink who wrote (238785)	8/15/2007 3:24:08 AM
From: pgerassi	Read Replies (1) \| Respond to of 275872

Dear Rink: L3 cache already has two levels to check for the presence of a address to be in cache. It has already taken 9 CPU cycles to find out that it is not in L1 or L2 caches local to the core. ZRAM has a access time of about 1ns or so. For a 2GHz core that is only 2 cycles to SRAM's one cycle. At 4Ghz, its 4:1. Now the L2 uses 8 cycles to check if an address is within cache at 2 sets a cycle. If you check 8 sets in a cycle then the check only takes two access cycles. Similarly for K10's L3 which is to be 32 way set associative. If you check all 32 sets at the same time, it will only take 4 CPU cycles at 4GHz. AS to moving data in and out, you can use a multiplexor to push the 576 bits (64 bytes plus ECC) through quickly to the SRAM L1 and L2 in an access cycle. So even L3 only needs an additional 3 access cycles (one to see if its in L3, one to move the resulting cache line to L1 and one to move the evicted line back into L3) or just 12 more CPU cycles at 4GHz. Parallel to that is the lookup in the other core's L1 and L2 caches. So in a total of 21 cycles at 4GHz, you know that the data requested is not within the cores or L3. One of the reasons why the SRAM caches look up addresses at a small amount of ways per cycle is that high speed SRAM uses quite a bit of power. Because ZRAM is slower, the logic checking the ZRAM uses far less power. Since we see that CPUs consume 5-10 times the power to go twice as fast, logic that is 20% of the speed should only take 1/50th the power. 16 times that is still saves more than 2/3rds of the power yet performs the check 3.2 times faster. The big savings of ZRAM comes in the die area used for a given size cache where the bigger that is, the more advantageous ZRAM is over SRAM. This has two reasons. One is that as the cache gets bigger, the wireline delays start to predominate over the inherent access time of the memory. Two is that the slower lower power CPUs have less of a disadvantage between their SRAM and ZRAM. Where the above is 21 cycles for the 4GHz case, at 2GHz, the above is 15 cycles. At 1GHz, SRAM holds no access advantage over ZRAM while ZRAM has lower wireline delays. How big would a 30MB L3 SRAM cache Shanghai be at 45nm? ~610mm2. A 30MB ZRAM L3 cache within a Shanghai would be the same size as a 6MB SRAM L3 cache Shanghai, about 270mm2. Which would you rather have, a 6MB L3 K10 QC or a 30MB L3 K10 QC with another 3 cycles more of L3 latency at the same price. Most would opt for the latter for server and HPC work. Pete

To: Rink who wrote (238785)	8/15/2007 10:13:58 AM
From: Joe NYC	Respond to of 275872

Rink, If so it is likely to influence core performance because AMD's core design seems to prefer lower latency for relatively small cache over higher latency large cache. L3 is completely independent of the core(s). AMD will be selling Barcelona derived processors with or without L3. L3 is there mainly for reduction of latency to main memory. Joe

To: Rink who wrote (238785)	8/15/2007 10:43:30 AM
From: combjelly	Read Replies (2) \| Respond to of 275872

"If so it is likely to influence core performance because AMD's core design seems to prefer lower latency for relatively small cache over higher latency large cache." In general, that would be true. But because of the way the L3 is implemented, as long as the latency is better than the main memory, it is a win. Now, with a large enough L3 with respect to the L1 and L2, at some point moving to an inclusive L3 would be a bigger win than with an exclusive L3 because the traffic due to evictions is eliminated. So, the larger question is, has AMD put the ability to use either an inclusive or exclusive L3 in Shanghai? If they have, then ZRAM becomes a real possibility. If not, then likely not.