Silicon Investor (SI) -- The First Internet Community

STOCKTALK

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor. We ask that you disable ad blocking while on Silicon Investor in the best interests of our community. If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.

Politics : Formerly About Advanced Micro Devices -- Ignore unavailable to you. Want to Upgrade?

To: John Walliker who wrote (77083)	10/26/1999 10:52:00 AM
From: Ali Chen	Read Replies (1) \| Respond to of 1588557

Petz wrote <The percentage of time that 800 MB/sec is not enough may well be more than 2.5%. But I suspect that just one additional clock cycle of latency to arbitrate the RIMM channels is enough to make the bandwidth advantage meaningless.> You asked: "Could you be more specific about exactly what you mean please?" Let me try, in highly simplified terms: Access to memory usually happens in chunks of data with granularity of a cacheline, 32Bytes at a time. On lucky occasions, when accessing contiguous address areas AND the rate of requests is higher than certain speed, those cacheline chunks can be combined in a DRAM controller into a contiguous long burst. That is where the peak bandwidth may be reached for some periods of time. If the rate of requests from other busses (FSBs, PCI-DMA, AGP) is slower than a certain threshold, DRAM controller has to "close the DRAM page" and then re-open it after the next request. This process involves significant latency of 5 to 12 bus clocks, when no data gets transferred at all, dead time. In addition (or maybe for the first reason), these "optimal" bursts are frequently interrupted by the write-back traffic (when old data from L2 cache has to be written back into main memory before they get replaced with new data). Therefore the overall performance of memory subsystem heavily depends on the ratio between burst transfers (when full bandwidth is utilized), and those interruptions when the latency dominates. What Pets said is that every extra clock of latency in the RAMBUSt may easily eat all the advantages of higher bandwidth. Due to described mechanisms, the peak bandwidth is never realized in any practical situation. Even under highly specialized benchmark conditions like STREAMs, where all addresses are contiguous by design, the average transfer rate in RAMBUSt does not exceed 1/3 of the peak: aceshardware.com and the i820 system loses heavily to all other platforms. My guess is that something is heavily underlooked in the logic of i820 controller, or in the whole RAMBUS protocol.

To: John Walliker who wrote (77083)	10/26/1999 1:25:00 PM
From: Petz	Respond to of 1588557

John Walliker, explaining in layman's terms - "The percentage of time that 800 MB/sec is not enough may well be more than 2.5%. But I suspect that just one additional clock cycle of latency to arbitrate the RIMM channels is enough to make the bandwidth advantage meaningless." I was trying to explain why the i840 dual channel RAMBUS motherboard only offers a 2.5% speed improvement over a BX chipset motherboard with one fourth the memory bandwidth. There are two reasons - 1. Most of the time (>95%) the memory<->CPU bandwidth of the BX chipset (800 MB/sec) is more than adequate. 2. The latency (time to get first byte of data for filling a cache line in the CPU) is greater for RDRAM than for SDRAM, and is probably longer still for dual channel RDRAM than for single channel. Taking 1) and 2) together, the extra time the CPU takes waiting for the first byte of data negates the (sometimes) shorter wait times which occur once a stream of data has started. Petz