SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Politics : Formerly About Advanced Micro Devices -- Ignore unavailable to you. Want to Upgrade?


To: Bilow who wrote (107735)4/25/2000 5:57:00 AM
From: Joe NYC  Respond to of 1572871
 
Carl,

P.S. I hope that Via does a better job with their DDR chipsets, but I guess that other companies are working on competing sets anyway...

Well, AMD is working on it as well. I hope we will get a good performing chipset from AMD. AMD has a strong incentive to do very well, because the chipset can make an important difference in the benchmarks.

Joe



To: Bilow who wrote (107735)4/25/2000 6:12:00 AM
From: Joe NYC  Read Replies (3) | Respond to of 1572871
 
Carl,

I have a very basic question about RDRAM vs. SDRAM. Here are 2 articles on the web:

hardwarecentral.com

SDRAM performance is actually measured with two metrics: bandwidth and latency. Surprisingly RDRAM does not only offer a higher bandwidth, but its latency has also been improved relative to SDRAM. What may be even more surprising is that PC133 SDRAM latency is worse than PC100 SDRAM.

How is component latency defined? The accepted definition of latency is the time between the moment the RAS (Row Address Strobe) is activated (ACT command sampled) to the moment the first data bit becomes valid. Synchronous device timing is always a multiple of the device clock period.

The fundamental latency of a DRAM is determined by the speed of the memory core. All SDRAMs use the same memory core technology, so all SDRAMs are subject to the same latency. Any differences in latency between SDRAM types are therefore only the result of the differences in the speed of their interfaces.

At the 400 MHz databus, the interface to a RDRAM operates with an extremely fine timing granularity of 1.25ns, resulting in a component latency of 38.75ns. The PC100 SDRAM interface runs with a coarse timing granularity of 10ns. Its interface timing matches the memory core timing very well, so that its component latency ends up being 40ns. The PC133 SDRAM interface, with its coarse timing granularity of 7.5ns, incurs a mismatch with the timing of the memory core that increases the component latency significantly, to 45ns.

The latency timing values can be computed easily from the device data sheets. For the PC100 and PC133 SDRAMs, the component latency is the sum of the tRCD and CL values. The RDRAM's component latency is the sum of the tRCD and TCAC values, plus one half clock period for the data to become valid.

Although component latency is an important factor in system performance, system latency is even more important, since it is system latency that reduces overall performance. System latency is determined by adding external address and data delays to the component latency. For PCs, the system latency is measured as the time to return 32-bytes of data, also referred to as the 'cache line fill' data, to the CPU.

In a system, SDRAMs suffer from what is known as the two-cycle addressing problem. The address must be driven for two clock cycles (20ns at 100 MHz) in order to provide time for the signals to settle on the SDRAM's highly loaded address bus. After the two-cycle address delay and the component delay, three more clocks are required to return the 32 bytes of data. The system latency of PC100 and PC133 SDRAM add five clocks to the component latency. The total SDRAM system latency is:

40 + (2 x 10) + (3 x 10) = 90ns for PC100 SDRAM
45 + (2 x 7.5) + (3 x 7.5) = 82.5ns for PC133 SDRAM

The superior electrical characteristics of a RDRAM eliminate the two-cycle addressing problem, requiring only 10ns to drive the address to the RDRAM. The 32 bytes of data are transferred back to the CPU at 1.6 GB/second, which works out to be 18.75ns. Adding in the component latency, the RDRAM system latency is:

38.75 + 10 + 18.75 = 67.5ns for PC800 RDRAM

Measured at either the component or system level, RDRAMs have the fastest latency. Surprisingly, due to the mismatch between its interface and core timing, the PC133 SDRAM latency is significantly higher than the PC100 SDRAM. The RDRAM's low latency coupled with its 1.6 gigabyte per second bandwidth provides the highest possible sustained system performance.

From a performance point of view we must note that L1 and L2 cache hits and misses contribute greatly to memory architecture performance. Also, individual programs vary in memory use and so have different impacts on its performance. For example, a program that uses random database search using a large chunk of memory will 'thrash' the caches, and the memory architecture having the lowest latency will have the advantage. On the other hand, large sequential memory transfers with little requirement for CPU processing can easily saturate SDRAM bandwidth. RDRAM will have an advantage here with its higher bandwidth. For code that fits nicely within the L1/L2 caches, memory type will have virtually no impact at all.


Here is another one, which contradicts it:
aceshardware.com

For example, the peak bandwidth of RAMBUS PC800 is 1600 MB/s. But with random memory accesses the first 4 bytes arrive after 11 cycles, and typically a 32 byte transfer (to transmit a 32 byte cache line of data to the CPU) takes 11-1-1-1 cycles or 14 cycles. If the FSB runs at 133 MHz, the bandwidth for random memory accesses to the CPU is 32 bytes x 133 MHz / 14 = 304 MB/s.

SDRAM PC133 will do better in those circumstances (random accesses). It takes 7-1-1-1 cycles to transfer a 32 bit line to the CPU's cache, so the CPU will receive (32 bytes x 133 MHz) per 10 clockcycles = 428 MB/s. If the memory accesses are more sequential however, than the initial latency will not be so important. For example if we can read 64 bytes sequential than we have 11-1-1-1 for the first 32 bytes but only 4 cycles (1-1-1-1, simplified) for the next 32 bytes. So the bandwidth will come closer to the peak: 64 bytes x 133 MHz/ 18 cycles= 473 MB/s. Bursts of memory traffic with sequential accesses will lower the influence of the initial latency and the average bandwidth to the CPU will rise.


What is this component latency referred to in the first article and why would this component latency be higher with PC-133 than PC-100.

The other article has a completely different results as to how long it would take to deliver 32 bytes of data? Which one do you think is correct?

And while you are at it, what is this two-cycle addressing delay the first article refers to?

Thanks

Joe

(Edit: I guess I should have posted in RMBS thread)