To: Bilow who wrote (107735 ) 4/25/2000 6:12:00 AM From: Joe NYC Read Replies (3) | Respond to of 1572871
Carl, I have a very basic question about RDRAM vs. SDRAM. Here are 2 articles on the web:hardwarecentral.com SDRAM performance is actually measured with two metrics: bandwidth and latency. Surprisingly RDRAM does not only offer a higher bandwidth, but its latency has also been improved relative to SDRAM. What may be even more surprising is that PC133 SDRAM latency is worse than PC100 SDRAM. How is component latency defined? The accepted definition of latency is the time between the moment the RAS (Row Address Strobe) is activated (ACT command sampled) to the moment the first data bit becomes valid. Synchronous device timing is always a multiple of the device clock period. The fundamental latency of a DRAM is determined by the speed of the memory core. All SDRAMs use the same memory core technology, so all SDRAMs are subject to the same latency. Any differences in latency between SDRAM types are therefore only the result of the differences in the speed of their interfaces. At the 400 MHz databus, the interface to a RDRAM operates with an extremely fine timing granularity of 1.25ns, resulting in a component latency of 38.75ns. The PC100 SDRAM interface runs with a coarse timing granularity of 10ns. Its interface timing matches the memory core timing very well, so that its component latency ends up being 40ns. The PC133 SDRAM interface, with its coarse timing granularity of 7.5ns, incurs a mismatch with the timing of the memory core that increases the component latency significantly, to 45ns. The latency timing values can be computed easily from the device data sheets. For the PC100 and PC133 SDRAMs, the component latency is the sum of the tRCD and CL values. The RDRAM's component latency is the sum of the tRCD and TCAC values, plus one half clock period for the data to become valid. Although component latency is an important factor in system performance, system latency is even more important, since it is system latency that reduces overall performance. System latency is determined by adding external address and data delays to the component latency. For PCs, the system latency is measured as the time to return 32-bytes of data, also referred to as the 'cache line fill' data, to the CPU. In a system, SDRAMs suffer from what is known as the two-cycle addressing problem. The address must be driven for two clock cycles (20ns at 100 MHz) in order to provide time for the signals to settle on the SDRAM's highly loaded address bus. After the two-cycle address delay and the component delay, three more clocks are required to return the 32 bytes of data. The system latency of PC100 and PC133 SDRAM add five clocks to the component latency. The total SDRAM system latency is: 40 + (2 x 10) + (3 x 10) = 90ns for PC100 SDRAM 45 + (2 x 7.5) + (3 x 7.5) = 82.5ns for PC133 SDRAM The superior electrical characteristics of a RDRAM eliminate the two-cycle addressing problem, requiring only 10ns to drive the address to the RDRAM. The 32 bytes of data are transferred back to the CPU at 1.6 GB/second, which works out to be 18.75ns. Adding in the component latency, the RDRAM system latency is: 38.75 + 10 + 18.75 = 67.5ns for PC800 RDRAM Measured at either the component or system level, RDRAMs have the fastest latency. Surprisingly, due to the mismatch between its interface and core timing, the PC133 SDRAM latency is significantly higher than the PC100 SDRAM. The RDRAM's low latency coupled with its 1.6 gigabyte per second bandwidth provides the highest possible sustained system performance. From a performance point of view we must note that L1 and L2 cache hits and misses contribute greatly to memory architecture performance. Also, individual programs vary in memory use and so have different impacts on its performance. For example, a program that uses random database search using a large chunk of memory will 'thrash' the caches, and the memory architecture having the lowest latency will have the advantage. On the other hand, large sequential memory transfers with little requirement for CPU processing can easily saturate SDRAM bandwidth. RDRAM will have an advantage here with its higher bandwidth. For code that fits nicely within the L1/L2 caches, memory type will have virtually no impact at all. Here is another one, which contradicts it:aceshardware.com For example, the peak bandwidth of RAMBUS PC800 is 1600 MB/s. But with random memory accesses the first 4 bytes arrive after 11 cycles, and typically a 32 byte transfer (to transmit a 32 byte cache line of data to the CPU) takes 11-1-1-1 cycles or 14 cycles. If the FSB runs at 133 MHz, the bandwidth for random memory accesses to the CPU is 32 bytes x 133 MHz / 14 = 304 MB/s. SDRAM PC133 will do better in those circumstances (random accesses). It takes 7-1-1-1 cycles to transfer a 32 bit line to the CPU's cache, so the CPU will receive (32 bytes x 133 MHz) per 10 clockcycles = 428 MB/s. If the memory accesses are more sequential however, than the initial latency will not be so important. For example if we can read 64 bytes sequential than we have 11-1-1-1 for the first 32 bytes but only 4 cycles (1-1-1-1, simplified) for the next 32 bytes. So the bandwidth will come closer to the peak: 64 bytes x 133 MHz/ 18 cycles= 473 MB/s. Bursts of memory traffic with sequential accesses will lower the influence of the initial latency and the average bandwidth to the CPU will rise. What is this component latency referred to in the first article and why would this component latency be higher with PC-133 than PC-100. The other article has a completely different results as to how long it would take to deliver 32 bytes of data? Which one do you think is correct? And while you are at it, what is this two-cycle addressing delay the first article refers to? Thanks Joe (Edit: I guess I should have posted in RMBS thread)