SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Politics : Formerly About Advanced Micro Devices -- Ignore unavailable to you. Want to Upgrade?


To: Joe NYC who wrote (107738)4/25/2000 6:43:00 AM
From: Bilow  Read Replies (2) | Respond to of 1572912
 
Hi Jozef Halada; The subject of latency is complicated enough that real engineers run simulations rather than listen to the flack put out by the various liars...

The hardwarecentral article, is (more or less) the same as the one that Samsung put out several years ago, when they were still flogging RDRAM (due to their lead over the other memory makers in that technology). They are now flogging DDR, as is everybody else, but only to design engineers. The public will start to see the products this summer, and they will be sold as heavily as Nvidia publicized its use of DDR.

The article does a comparison of the best available RDRAM against the most tepid available PC133. This is sort of traditional when hyping new technology. Some time ago, I posted a list of DDR and RDRAM part numbers currently available. Each maker puts out about a dozen different RDRAM chips, and they have different bandwidth and latency. Same with SDRAM. You can the comparisons come out either way by choosing the right parts.

The Samsung article is shot with other misrepresentations. For instance, they measure latency as the time to the first bits back from memory. But the interface to the processor is 64 bits wide, and Rambus only gives you 16 bits at a time. So you have to wait 3.75ns more in order to assemble enough bits to actually do something with it.

Another thing the Samsung article talks about is the granularity of the RDRAM interface. In actual use, this has to be multiplied by the amount of bits that you are going to move around at a time, similar to the latency issue. Sixteen bits just doesn't cut it in a memory controller, no processor on the planet still executes instructions that slowly, for instance.

The two cycle addressing problem is the time required for the address bus to settle to a stable value. It's worst on the address bus cause they have to go to every chip. The clock lines do to, but they don't have to carry information, and generally end up better terminated. DDR fixes this problem by going to a newer voltage level definition, SSTL2 (but it also probably uses more power than the old LVTTL3. But the chips as a whole use less per bit, due to the voltage reduction.)

Controllers for very large memory arrays will duplicate the address pins in order to minimize this problem. Of course that takes more controller pins. Naturally, the Rambus hypesters make latency calculations assuming that you don't have the extra controller pins, but make pin savings calculations assuming you had to use them. And when you do have a lot of memory chips, so that problem is at its worst, it corresponds to the case in an RDRAM system where the extra delay due trace propagation delay is worst. And that, naturally, isn't included in their latency calculations.

Real latency calculations would be so much more complicated than what shows up in these white papers as to be almost beyond belief. The real equations should allow choice of components, frequencies, topologies, number of chips, organization of chips, and also allow for various choices in the termination and connectors etc. Real life is complicated, and also depends on the ratio of cache misses, and how far away those misses go. (Making DRAM rows longer increases the chance of a row hit, thereby decreasing average latency, but also increases power consumption, for instance.)

Engineering is all about trade offs, and it is complicated to an amazing degree. Rambus is all about hyping a stock. The management of some of their partners in industry are thinking about making money off of options rather than making products that offer the best price/performance. This has dulled their decision making capability. AMD will probably continue to pick up market share for the next 18 months, and then will become a completely entrenched major player in the x86 marketplace (and elsewhere), as it will take a long time for Intel to dig themselves out of this.

-- Carl



To: Joe NYC who wrote (107738)4/25/2000 9:48:00 AM
From: FJB  Read Replies (2) | Respond to of 1572912
 
Intel loosens timing spec to spur Rambus usage

...


But one RDRAM vendor, Samsung, denied that prices are artificially inflated, and said there is no great difference between the RDRAM selling price and its cost of production. "I've heard that 20 percent figure from Intel about 10 times," said Jay Hoon Chung, manager of DRAM marketing for Samsung Electronics Co. (Seoul, South Korea), currently the largest supplier of RDRAMs. "But 20 percent is not probable by our point of view. We expect the price gap will be 1.5x by the fourth quarter."


...

techweb.com



To: Joe NYC who wrote (107738)4/25/2000 11:49:00 AM
From: pgerassi  Read Replies (1) | Respond to of 1572912
 
Dear Jozef:

The component latency is the time if the RDRAM controller was directly next to the RDRAM component. The first example uses the best current component RDRAM timings against the maximum allowed SDRAM timings for the entire memory subsystem. This would be allowed if the xDRAM controllers are embedded on the CPU. The problem with this analysis comes from if two or more RDRAM components are needed to get the amount of memory necessary. RDRAM components are chained together. The second component is connected to the first. This increases the latency by a fraction of a clock cycle even if the components are right next to each other. Current RDRAM modules have 4, 8, 12, and even 16 RDRAM components on them. Also must be considered the length of the traces between modules also add to latency. Since RDRAM comes in 8M byte or now 16M byte components, 16 or 8 components are needed for a 128M byte memory subsystem. This increases latency by as much as 2 to 8 cycles or 10nsec. If the PC133 has a timing of CAS 2, the latency for PC133 (becoming more typical) is reduced by 15nsec. In REAL cases, RDRAM latency exceeds SDRAM latency. That is why, server vendors want to use SDRAM because the latency does not increase for very large arrays of memory.

The second case refers to the typical PC case where the RDRAM controller sits on the hub and is connected to the CPU through the FSB, Front Side Bus. In this situation, subsequent cycles do not occur at the RDRAM interface but, occur at the FSB interface assuming that the RDRAM bandwidth is higher than the FSB bandwidth. Here the bottleneck for RDRAM is the FSB and it transfers, for Coppermines, at one 8 byte transfer every 7.5nsec, or 133Mhz. Thus the latency is the chain latency plus the hub latency plus the FSB latency and the bandwidth is the bandwidth of the narrowest part of the chain, namely the FSB. Thus the overall transfer rate is the latency of the RDRAM plus the bandwidth of the FSB. In SDRAM systems, the latency bottleneck is in the SDRAM used. Thus faster SDRAM helps increase overall usuable transfer rate. Thus CAS 2 SDRAM transfers 32 bytes, for PIII, in 8 7.5nsec cycles for 60nsec total and PC800 RDRAM 128MB Module takes 30 1.25nsec cycles plus 15 1.25nsec cycles for the chain plus 3 1.25nsec cycles for synchronization plus 1 7.5nsec cycle from the FSB to the hub plus 4 7.5nsec cycles to transfer the data which equals 11 7.5nsec cycles for 37.5nsec RDRAM. This is the best case for PC800 RDRAM. The typical case would use 45nsec PC800 RDRAM and 6 additional 1.25nsec cycles for trace routing and 2 more 1.25nsec cycles for RDRAM controller delays for a sum of 3 additional 7.5nsec cycles. Thus the typical PC800 RDRAM transfer takes 14 7.5nsec cycles. A faster FSB like the Athlon and greater size transfers like the Athlon reduce the difference. For CAS 2 PC133 SDRAM, the transfer takes 12 7.5nsec cycles and for the best PC800 RDRAM takes 15 5nsec cycles. Thus SDRAM takes 90nsec per 64 byte transfer while, best RDRAM takes 75nsec per 64 byte transfer, and typical RDRAM takes 90nsec per 64 byte transfer. In this case RDRAM beats SDRAM.

Now if we use PC1600 DDRDRAM, it would take just 70nsec, and PC2100 DDRDRAM takes 60nsec. PC2100 DDRDRAM would be the fastest.

I hope this clears this up.

Pete