SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Rambus (RMBS) - Eagle or Penguin
RMBS 113.83-6.4%Jan 30 9:30 AM EST

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: Dan3 who wrote (47210)7/14/2000 3:26:50 PM
From: Tenchusatsu  Read Replies (2) of 93625
 
Dan, < There is one 7.5 ns clock for Row select then (cas 2 @ 133) 15 ns for a total of 22.5 ns or clocks before data is available.>

More proof that you don't know what you are talking about. Row select is delayed by RCD, i.e. the number of nsec between row and column. For '-7' markings (i.e. 2-2-2), RCD is 15 ns. For '-75' markings (i.e. 3-2-2), RCD is 20 ns, which is rounded up to 22.5 because each clock is 7.5 ns. So in the absolute best case, latency is 30 ns. Sure, it's a little lower than the best case for RDRAM, but we are talking about the absolute best case for DDR. The vast majority of PC133 parts are CAS3, which isn't even as good as 3-2-2. What makes you think the DRAM guys all of a sudden found a magical formula to yield DDR parts at even better latencies? (Hint, they haven't, which is why some will yield at DDR-200, some will be DDR-266 3-3-3, some will be DDR-266 3-2-2, and a few will be the best case DDR-266 2-2-2.)

And besides, all this translates into a 10 nsec difference in latency at best. The design of a desktop chipset itself can make a difference of 10 to 30 nsec in latency. That's why I have always made the assertion that the design of the chipset matters much more than the core latency of DRAM.

<DDR 266 runs at the same 133MHZ frequency as PC133 - not the 400MHZ of Rambus. There is no obvious reason why DDR should be more difficult to implement or require more board space than other 133MHZ traces. Your claim that it will require twice the space of other 133MHZ runs makes no sense - why should it?>

Oh, let's see, because the data is being DOUBLE-PUMPED? Sure, the control and address lines are running at 133 MHz, but the data lines are running at 266 MHz. And the electricals are less well-behaved than RDRAM. That's why RDRAM can run at 800 MHz. DDR will require more robust electricals, perhaps as robust as RDRAM if not more. (I make this conclusion from the fact that some of the "learning" required for RDRAM implementation can also be applied to DDR's challenges. Also, I make this conclusion based on the probability that DDR-based motherboards will require six layers, vs. four layers for the 820 chipset.)

<See above - I'm talking cpu clocks and the 40 is taken straight from the Rambus specification. If you have a problem with those numbers then take it up with Rambus!>

No, you never made it clear that you were talking about CPU clocks. Here's what you said:

Dual DDR channels will fill the 256 bit cache line of coppermine in a two memory bus clocks, with the first 32 bytes ready in a half memory bus clock. DDR in sampling video cards is already running at a 333MHZ rate (166 clock, so called PC2600) - at cas 2 that's 21 clocks at 1 GHZ till there are 16 bytes in the cache, and 9 more to fill the line. So it's 21 clocks to end the stall and 30 clocks to fill a line. Dual PC800 rambus takes a flat 40 clocks to read the cells and put the packet together then another 1.5 clocks to put the first 4 bytes into the cpu, with 10.5 more needed to fill the line for a total of 52 clocks to fill a cache line.

You switch between memory clocks and something about "1 GHz" clocks, which I'm supposed to translate as processor clocks? Sorry, Dan, but it's not my fault (nor Rambus') if you can't make a coherent argument.

<Wide memory channels fill wide cache lines in fewer bus cycles.>

Sure, if the speed is the same. But the speeds aren't the same; RDRAM transfers data at a much faster rate than DDR. That's why RDRAM can squeeze data through a 16-bit channel. DDR already requires a 64-bit channel in order to brute-force its burst rate above RDRAM, and even then, DDR is less efficient at using that bandwidth.

<At 800MHZ, dual channel rambus moves 4 bytes in each memory bus cycle - it's actually 1.24 clocks, but I've never read of better than half cycle latching, so I called it 1.5. I suppose that some chipset/CPUs wait for an entire cache line to be filled before accepting data (and ending the stall that led to the request in the first place) but my understanding was that current Intel and AMD systems were brighter than that. Am I right or wrong?>

RDRAM transfers data in packets of 16 bytes. I'm guessing that with Intel's 820 chipset, it must wait for all 16 bytes to come in before starting the data transfer on the FSB. That translates to 10 nsec. Then the next 16 bytes will come in from RDRAM faster than the first 16 bytes can go over the FSB, so no matter what, we're limited by the FSB. In the case of dual-RDRAM (840 chipset), 32 bytes are transferred at a time from RDRAM. The chipset must wait for all 32 bytes before sending it over onto the FSB. But that also translates into 10 nsec, and we're still limited by the FSB. (By the way, a P6 cacheline is 32 bytes, if you didn't know, so a ton of the potential RDRAM bandwidth in the 840 chipset is left unused. Tehama should make much better use of the RDRAM bandwidth thanks to the quad-pumped Willamette bus.)

The same problem will hit DDR. Because data is double-pumped and source-synchronous, the chipset will have to wait until two data transfers come in (i.e. 16 bytes) before sending it to the FSB. Thanks to the 133 MHz clock of DDR-266, that translates to 7.5 nsec, slightly less than RDRAM, but not enough to make much of a difference.

<what's that [AMD's LDT technology] got to do with moving data between main memory and CPU?>

Because most of your argument rests on the notion that packetizing data increases latency way too much, and that for moving data, "wider is better." But LDT goes against the grain of your logic with its narrow channels and the requirement of packetizing data. No matter whether it's a chip-to-chip interconnect like LDT, or a DRAM-to-chipset interconnect like Rambus, or a chipset-to-processor interconnect, or even a processor-to-L2 interconnect, it's all the same game of moving data from one place to another.

We can go on and on, Dan, but I simply do not have the time to correct all your little inaccuracies and mistakes in logic. No matter what, you'll be convinced that RDRAM is a bad technology, even if I prove otherwise beyond a shadow of a doubt. And that's just the way you'll always see things. At least I try and keep an open mind.

Tenchusatsu
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext