SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: Ali Chen who wrote (75740)3/27/2002 7:36:49 PM
From: pgerassiRespond to of 275872
 
Dear Ali:

Those are without considering contentions for memory. And many of those pipelining examples assume well predicted needs which do NOT exist for high end server type systems. And do not trot out streaming examples since they need even less cache than normal code.

Also pipelined requests need something in common as well as free banks or pages and these assume that little if any writing occurs. The numbers get worse for registered (buffered) memory used to get 4GB in current systems. The access is 2T+3.5+3.5 for a total of 9 cycles at 6ns/cycle. With RAMBUS, 1GB per channel uses the maximum 32 devices which add 62ns to that of one device which the minimum latency currently is 35ns (read very expensive) and that doesn't include memory controller latencies and the rest. 54ns for PC2700 and 97ns for PC1200 are much higher than the data transfer time of 24ns for PC2700 and 27ns for PC1200 (64 byte cache fill). And these are theorectical minimums where real world uses will be higher.

Most real world code is quite random at cache filling. Those applications that run mostly in cache actually have higher latency to memory (pipelining fails because memory subsystem idle). And this does not take into account OS system calls which use entirely different memory areas and have a preponderance of cache flushing and filling after each call (just think of the writes and reads for saving registers, etc.).

Just take a real world example. Check what happens when you run a bench at CAS 2 and at CAS 3. How much performance increase occurs between 3 and 2. How much increase going from FSB 100 CAS 2 to FSB 133 CAS 3 (the increase in CAS to make latency stay about the same (Make sure PCI and AGP not overclocked). Given previous attempts at finding this out, the result was that latency was more important than bandwidth.

Pete