Hi John Walliker; Thanks for the compliment on my understanding of Rambus. (I have to take these as I get them...) Re: However, there is another possibility along these lines. Suppose that a chip 8.75cm from the controller sends (at least) two successive transmissions. On the second transmission there will be a superposition of the reflected and the current transmission, doubling the voltage. This doubled voltage will propagate back towards the termination. Now suppose that another device 17.5cm nearer the termination (26.25cm from the controller) starts transmitting on the next clock edge. This device will see the superposition of the first transmission reflected from the controller, the second transmission and its own transmission, bringing the level to 1.5 times the nominal Rambus signalling level. The signal is now below the lowest voltage at which the output characteristics of the drivers is specified, so while it may still work, the margins will be much lower. The same thing would also happen about 43.75cm from the controller.
The above thought came to me as well. Then I realized that in order for this to happen, it must also be the case that the data outputs from the two chips would collide at the controller. In other words, in order to have the above, they would have to have had a data bus collision. Because of this, I concluded that it wasn't a realistic scenario, but you may be right.
For instance, if the first driving chip was farther from the controller than the second driving chip, and they were clocked simultaneously, then the effect would happen as follows: The second chip would be placing data on the bus at the same time that data (placed earlier on the bus) was propagating past it, as well as reflected data placed much earlier on the bus. The second chip's data would then arrive at the controller at the same time as the first chip's.
The traditional way of eliminating this problem is to put a dead time of a cycle (i.e. 1.25ns) between one output driving and the next one driving. I haven't looked to see whether Rambus decided to try and finesse that dead time away, but it is possible... In fact, the more I think about it, I bet that they did cut that corner...
The way you would cut that particular corner is by requiring the wave fronts from the first chip to end at the second chip, just as the wave front from the second chip begins. The two signals would then not collide at the controller. In addition, the second chip would not have to drive a "1" into two a full "1" already on the bus. You would do that by routing the master clock, the one that controls the read wave fronts, backwards. That is, the clock driver would be closest to the last Rambus chip, and then snaked back through them. I noticed that the master clock was routed this way, so I am beginning to think that that is what Rambus did. I suppose I could go look in the spec...
Then, in order to avoid having what amounts to a bus contention issue, they would have to specify that each RDRAM chip have enough "break before make" to avoid contending. Plus enough margin to allow for differences in propagation speed between lines, etc. Of course the problem would occur on the first or last bits of a burst, just like any other typical bus contention issue.
I just had another thought... If Rambus solved this problem by requiring a dead space of 1.25ns (or maybe 2.5ns, since it is a DDR type clock), the effect might be a lot worse than merely a slight decrease in bandwidth. The reason is that the controllers for RDRAM are clocked internally at much lower frequencies than 400MHz or 800MHz. One would end up adding a dead time of one of the slower clocks, otherwise you would lose the synchronicity between what happens on the RDRAM bus and what happens on the internal clock. (You could design a system that would let those clock domains run independently, but it would increase complexity a lot, and if you thought you had solved the bus contention issue, you wouldn't implement it, as it would be unneeded. The extra complexity would be due to the fact that the read time, for instance, would be harder to predict, rather than simply a constant number of the slower clocks.)
The specs I saw, over on the Rambus web site, as well as on the LSI-Logic web site, was for the controller's internal clock rate (maximum) to be 100MHz or 200MHz, respectively. Though these things are programmable, these were what appeared to correspond to the highest bandwidths available. Since these are powers of two, compared to 800MHz, this means that adding a 1.25ns dead time causes a loss of performance of a full slow clock, i.e. 10ns or 5ns. If the data is being burst across in bursts of length eight (which is 16 bytes), then the increase in time required to perform consecutive reads from different RDRAMs would be increased by 50% or 25%. While most consecutive reads would presumably be from the same RDRAM chip, this still might be enough of a performance hit to bum the engineers out.
Remember how Rambus talks about achieving a 94% bus utilization? This could be the cause of the problem. Bus contention.
Incidentally, there is another issue with regard to output compliance. Any difference in RIMM impedance will result in a difference in signal voltage. Thus the reflected signal amplitude is not perfectly predictable, though you can place limits on it. The system would have to have enough extra compliance to overcome this effect as well.
-- Carl |