Dear Jozef:
The component latency is the time if the RDRAM controller was directly next to the RDRAM component. The first example uses the best current component RDRAM timings against the maximum allowed SDRAM timings for the entire memory subsystem. This would be allowed if the xDRAM controllers are embedded on the CPU. The problem with this analysis comes from if two or more RDRAM components are needed to get the amount of memory necessary. RDRAM components are chained together. The second component is connected to the first. This increases the latency by a fraction of a clock cycle even if the components are right next to each other. Current RDRAM modules have 4, 8, 12, and even 16 RDRAM components on them. Also must be considered the length of the traces between modules also add to latency. Since RDRAM comes in 8M byte or now 16M byte components, 16 or 8 components are needed for a 128M byte memory subsystem. This increases latency by as much as 2 to 8 cycles or 10nsec. If the PC133 has a timing of CAS 2, the latency for PC133 (becoming more typical) is reduced by 15nsec. In REAL cases, RDRAM latency exceeds SDRAM latency. That is why, server vendors want to use SDRAM because the latency does not increase for very large arrays of memory.
The second case refers to the typical PC case where the RDRAM controller sits on the hub and is connected to the CPU through the FSB, Front Side Bus. In this situation, subsequent cycles do not occur at the RDRAM interface but, occur at the FSB interface assuming that the RDRAM bandwidth is higher than the FSB bandwidth. Here the bottleneck for RDRAM is the FSB and it transfers, for Coppermines, at one 8 byte transfer every 7.5nsec, or 133Mhz. Thus the latency is the chain latency plus the hub latency plus the FSB latency and the bandwidth is the bandwidth of the narrowest part of the chain, namely the FSB. Thus the overall transfer rate is the latency of the RDRAM plus the bandwidth of the FSB. In SDRAM systems, the latency bottleneck is in the SDRAM used. Thus faster SDRAM helps increase overall usuable transfer rate. Thus CAS 2 SDRAM transfers 32 bytes, for PIII, in 8 7.5nsec cycles for 60nsec total and PC800 RDRAM 128MB Module takes 30 1.25nsec cycles plus 15 1.25nsec cycles for the chain plus 3 1.25nsec cycles for synchronization plus 1 7.5nsec cycle from the FSB to the hub plus 4 7.5nsec cycles to transfer the data which equals 11 7.5nsec cycles for 37.5nsec RDRAM. This is the best case for PC800 RDRAM. The typical case would use 45nsec PC800 RDRAM and 6 additional 1.25nsec cycles for trace routing and 2 more 1.25nsec cycles for RDRAM controller delays for a sum of 3 additional 7.5nsec cycles. Thus the typical PC800 RDRAM transfer takes 14 7.5nsec cycles. A faster FSB like the Athlon and greater size transfers like the Athlon reduce the difference. For CAS 2 PC133 SDRAM, the transfer takes 12 7.5nsec cycles and for the best PC800 RDRAM takes 15 5nsec cycles. Thus SDRAM takes 90nsec per 64 byte transfer while, best RDRAM takes 75nsec per 64 byte transfer, and typical RDRAM takes 90nsec per 64 byte transfer. In this case RDRAM beats SDRAM.
Now if we use PC1600 DDRDRAM, it would take just 70nsec, and PC2100 DDRDRAM takes 60nsec. PC2100 DDRDRAM would be the fastest.
I hope this clears this up.
Pete |