SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Politics : Formerly About Advanced Micro Devices -- Ignore unavailable to you. Want to Upgrade?


To: Trent George who wrote (114177)6/4/2000 11:28:00 PM
From: pgerassi  Respond to of 1576346
 
Dear Trent:

Athlon has a 64 Byte cache line on K75. I will assume it is the same on Thunderbird/Duron. 16 Way set associative can be done in parallel. The calculations are simple:

1) 256K Byte L2 cache / 64 Byte cache line = 4096 cache lines.

2) 4096 cache lines / 16 way set associative = 256 sets

3) 4GB (32 bit byte addresses) / 256 sets = 16M bytes per set

4) log base 2 of (16M) = 24 bits

5) 24 bits per set address - 6 bits per cache line address = 18 bits per set line address

6) Computer must compare each 18 bit set line address with the 18 bits from the same bits in the 32 bit memory address (8 bits define the set (this is usually the bits just more significant over the 6 bit cache line address bits (the lowest significant bits)) thus A5-A0 used for the cache line, A13-A6 for the set address, and A31-A14 for the set line address) for all 16 ways thus 18 x 16 = 288 parallel bit compares plus 16 18 input ands to signal whether a cache hit or miss occurred. This yields the fastest L2 cache possible (less than one cycle additional L2 latency.

A 64 byte cache line can be transferred between L1 and L2 in one cycle or less with a 512 bit bus. this may be easy since 64K byte L1 instruction and data caches probably use a pair of 256K bit SRAM arrays each (512 bit by 512 bit) and the L2 uses 8 such arrays. Thus, this would allow a two cycle L2 latency cache. Reducing the size of the data path or address decode parallelism just increases the latency.

The AMD technical specs should show how this is truly implemented.

Pete