SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Politics : Formerly About Advanced Micro Devices -- Ignore unavailable to you. Want to Upgrade?


To: John Walliker who wrote (77083)10/26/1999 10:52:00 AM
From: Ali Chen  Read Replies (1) | Respond to of 1577596
 
Petz wrote <The percentage of time that 800 MB/sec is not enough may well be more than 2.5%. But I suspect that just one additional clock cycle of latency to arbitrate the RIMM channels is enough to make the bandwidth advantage meaningless.>

You asked:
"Could you be more specific about exactly what you mean please?"

Let me try, in highly simplified terms:

Access to memory usually happens in chunks of data
with granularity of a cacheline, 32Bytes at a time.
On lucky occasions, when accessing contiguous
address areas AND the rate of requests is higher
than certain speed, those cacheline chunks can
be combined in a DRAM controller into a contiguous
long burst. That is where the peak bandwidth may
be reached for some periods of time.

If the rate
of requests from other busses (FSBs, PCI-DMA, AGP)
is slower than a certain threshold, DRAM controller
has to "close the DRAM page" and then re-open it
after the next request. This process involves
significant latency of 5 to 12 bus clocks, when no
data gets transferred at all, dead time. In addition
(or maybe for the first reason), these "optimal"
bursts are frequently interrupted by the write-back
traffic (when old data from L2 cache has to be
written back into main memory before they get
replaced with new data). Therefore the overall
performance of memory subsystem heavily depends
on the ratio between burst transfers (when full
bandwidth is utilized), and those interruptions
when the latency dominates. What Pets said is
that every extra clock of latency in the RAMBUSt
may easily eat all the advantages of higher bandwidth.
Due to described mechanisms, the peak bandwidth
is never realized in any practical situation.

Even under highly specialized benchmark conditions like
STREAMs, where all addresses are contiguous by design,
the average transfer rate in RAMBUSt does not exceed 1/3
of the peak:
aceshardware.com
and the i820 system loses heavily to all other
platforms. My guess is that something is heavily
underlooked in the logic of i820 controller,
or in the whole RAMBUS protocol.



To: John Walliker who wrote (77083)10/26/1999 1:25:00 PM
From: Petz  Respond to of 1577596
 
John Walliker, explaining in layman's terms - "The percentage of time that 800 MB/sec is not enough may well be more than 2.5%. But I suspect that just one additional clock cycle of latency to arbitrate the RIMM channels is enough to make the bandwidth advantage meaningless."

I was trying to explain why the i840 dual channel RAMBUS motherboard only offers a 2.5% speed improvement over a BX chipset motherboard with one fourth the memory bandwidth.

There are two reasons -
1. Most of the time (>95%) the memory<->CPU bandwidth of the BX chipset (800 MB/sec) is more than adequate.
2. The latency (time to get first byte of data for filling a cache line in the CPU) is greater for RDRAM than for SDRAM, and is probably longer still for dual channel RDRAM than for single channel.

Taking 1) and 2) together, the extra time the CPU takes waiting for the first byte of data negates the (sometimes) shorter wait times which occur once a stream of data has started.

Petz