SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: Tenchusatsu who wrote (238696)8/13/2007 5:00:53 PM
From: wbmwRespond to of 275872
 
Re: That's pretty impressive, given that the access time of SRAM is around 10ns, LOL ...

Depends what level of cache you're talking about. I'm amazed that Core 2 at 3.0GHz can have an L2 capacity at 4MB and access time of 12 cycles. That's 4ns access time....

Meanwhile, the L1 can be accessed in 3 cycles, or 1ns access time. Of course, the L1 is only 32KB for either data or instruction L1.

ZRAM gives more capacity at the expense of added latency. If it's greater latency than SRAM, and the capacity is much greater, then there's no way you are going to see 1-2ns access time. Maybe 10x that number, or more....



To: Tenchusatsu who wrote (238696)8/13/2007 5:45:11 PM
From: pgerassiRespond to of 275872
 
Dear Tench:

SRAM used in cache has a 200-400ps access time or less than 1 clock cycle. In L1 the SRAM is accessed three times, once for both set addresses, once for the data and once to set the recent set read. In L2 the 16 sets are accessed 2 at a time, the data is read and moved to L1 in a shot, the L1 LRU set is sent back into the L2 set's place, and the set recently viewed is marked. This all takes up to 17 cycles plus the 3 for the L1 lookup yields the 20 cycle L2 access time.

ZRAM access time is about 5 times as long. This isn't a problem for L3 as it needs 20 cycles to check L1 and L2. Given the higher density, a 32 way L3 can look up 16 sets in a 2-4 cycle period for two periods, send down the data found in a period, write the L2 cache eviction into l3 and finally set the recently viewed set. This takes 5 periods or 10-20 cycles for a total of 30-40 cycles which is still 1/3 that of main memory DRAM (45ns or 90 to 150 cycles). The pipe between L2 and L3 should be 512 bits wide or 256 bits DDR.

ZRAM is most effective at very large cache sizes where the density helps reduce overall latency (the wire delays are longer than an access). The higher density reduces the wire delays by roughly 5/9ths (5^-0.5 = 0.447).

A hybrid of SRAM for the set addresses and flags plus ZRAM for the cache data still nets a 4 times cache density increase but dropping the L3 latency to 7 to 11 cycles for a total of 27 to 31 cycles relative to the 90 to 150 cycles to DRAM.

Pete



To: Tenchusatsu who wrote (238696)8/13/2007 5:47:12 PM
From: Joe NYCRespond to of 275872
 
Tenchu,

So you agree that ZRAM would have a 1-2ns access time?

Not with that part, with part that ZRAM based Shanghai would have much more memory than currently scheduled 6 MB. Therefore Shanghai L3 will be SRAM based, not ZRAM.

BTW, I also said that ZRAM is slower than SRAM, and if you add to it that it would most likely be on L3, meaning, there is a lot more lookup logic on large cache than smaller L1 or L2, the 1-2ns are definitely out of the question.

Joe