SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: Joe NYC who wrote (59089)10/18/2001 10:05:14 AM
From: combjellyRead Replies (1) | Respond to of 275872
 
"and I don't know if it is possible to have onchip memory controllse plus off-chip L3"

Hard to say how you would do that. But, AMD doesn't have to go off-chip for an L3. Here, eet.com a Swiss team has developed a SOI DRAM cell that eliminates the capacitor by using the "floating body effect". With read and write times of less than 3ns it is fairly fast, and the size is pretty small, 0.04 square microns on 100nm, maybe 0.1 square microns at 130nm? So even with the extra ciruitry to make it work and the wiring to connect everything, an OEM should be able to shoe horn in 512k bytes into 1mm^2. So an on-chip 4 meg. L3 cache in less than 10mm^2 should be worth it.



To: Joe NYC who wrote (59089)10/18/2001 11:22:10 AM
From: dale_laroyRead Replies (1) | Respond to of 275872
 
>The next generation of AMD processors will have on chip memory controller, and I don't know if it is possible to have onchip memory controllse plus off-chip L3. And off-chip L3 would not have the performance of onchip increase. Off-chip L3 would need some kind of lookup mechanism, which causes additional delay in memory access.<

This is not entirely true. If one does not maintain any open pages the cache tag lookup of a L3 cache controller would certainly add delay, but it is possible to design a L3 cache controller using fast enough circuitry to match the delay of a typical open page hit detection circuit.

>It would be interesting if someone could calculate how much of L3 would you need to equal performance of say additional 256K of L2.<

This would depend upon the architecture of the L3 cache. For example, if one were to integrate the L3 cache into the DRAM itself, it would be possible to use very large cache lines by defining each open DRAM page as a cache line. Such an architecture could exceed the performance of 512KB of integrated L2 cache with 256KB of integrated L2 cache and just 256KB of off chip L3 cache.

Building upon the Virtual Channel Memory architecture, it would be possible to integrate as much as one page of SRAM cache for every 64-bits of DRAM. If the page size of the integrated L3 SRAM cache is 4KB to match the granularity of the PMMU of the x86 architecture, 16MB of 16-way associative L3 cache could be controlled with a L3 cache controller the size and complexity of the controller for the 256KB of L2 cache on the Athlon. When you consider how long the Athlon requires to complete it's lookup in the 16-way set associative integrated L2 cache, it becomes obvious that the L3 cache controller could potentially add as little as one FSB cycle to the access, versus a no open page architecture, and even a single open page architecture adds at least on FSB cycle.