SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Rambus (RMBS) - Eagle or Penguin
RMBS 96.35+5.4%3:59 PM EST

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: Ali Chen who wrote (34454)11/14/1999 3:48:00 AM
From: Bilow  Read Replies (1) of 93625
 
Hi Ali Chen; Re the CTM and CFM being shown as the same clock. They do assume this almost throughout the spec, but if you dig deep enough you will find it.

Take a look at Figure 59 in the Samsung KM416RD8C Revision 0.9, April 1999 Preliminary Direct RDRAM manual. (Available on the net from samsung.com )

That figure shows the range of delays for CTM with respect to CFM, and how that works out on the bus. The text on the same page of the pdf file, titled "RSL - Domain Crossing Window" explains, with respect to the CFM and CTM clock domains: "i.e. there is no restriction on the alignment of these two clocks."

Minor note: You've got the CFM and CTM reversed, it is the CFM that clocks the commands out. See the section titled RSL - Clocking for an explanation.

Of course the clock domain problem doesn't show up on writes, as they use just the CFM clock. But reads take a command on the CFM clock, and transmit data on the CTM clock. As you note, this is not a totally trivial thing to do. But it isn't that hard.

Rambus makes the assumption that the two clocks, CFM and CTM have are stable, within certain limits, after initialization. The limits required are defined by the necessity that every RDRAM be able to recognize a particular CTM edge as being the first edge in a read burst. (It is obvious that all the RDRAM will be able to agree that a particular CFM pulse is the start of a control burst, or a write burst, as the control burst is coincident with the CFM clock.) Failure to get all the RDRAMs to agree as to which CTM pulse is the first in a burst will mean that the controller will be unable to read the data.

As you mentioned, as the prop delay down the channel changes, the delay between CFM and CTM slides around. The question is how much does this slide around, with the RDRAM still identifying a particular edge as the "golden" edge. I suggest that it slides around by less than 2.5ns.

If the circuitry is implemented as a simple counter in the CFM domain, and then an end count is transferred to the CTM domain, so as to pick up the "next" CTM pulse as the golden pulse, then, just as you stated, this will inevitably result in metastability for some RDRAM chips. In addition, as temperature (and voltage &c.) change, some RDRAMs will disastrously jitter around which edge they are using. But this can be avoided.

Since I don't know what the internal circuitry of an RDRAM is, and don't care to find out, I obviously don't know how they solved this little problem. But I know how I might have solved it. So here is the way that I would be inclined to solve this. Note that I am typing this in on the fly, God only knows how I would change it with more thought.

First, notice that the minimum burst for RDRAM is four cycles long. Suppose we had a 2-bit binary counter running in the CTM domain. If we knew for a fact that all RDRAM reads were going to start with a fixed phase of that counter(for any given initialization), then we only have to transfer the READ command from the CFM domain to the CTM domain on every fourth CTM clock. This simplifies the problem immensely.

This division by four means that you can add in a lot more slop as to how the CTM and CFM signals are aligned with respect to each other. If I were doing the design, I might build the RDRAM internals on the CFM clock, and transfer READ command and read data to the CTM clock on every fourth clock. That would only require something like 100 flip flops, which is pretty small, but would mean that I would get a setup and hold time for the clock domain transfer of a nominal 2 clocks. After taking away a clock for the temperature dependent skew between CFM and CTM, I would still have 1.5 clocks for each of setup and hold. Tons of time.

So put a 2-bit counter in the CTM domain, and use it to choose which edges to sample read commands and read data on. If the counter has the incorrect phase, then the read data will show up at the wrong time. This is something that can be checked at initialization. So if it is wrong, then send a command to the RDRAM to pause the 2-bit counter for a cycle. (Of course this command has to be synchronized to the CTM domain, but this is trivial.) Repeat until the phase is correct.

Note that when passing the READ command from the CFM to the CTM domain, you will only have to sample the READ bits on every fourth CTM clock. Of course you must avoid setup and hold times on this, this is the essence of what we have been talking about. To do this, simply keep the READ command valid for four CFM pulses. Since there is a limit on how much the two clock domains may be out of phase, and since that limit is one clock period, you have almost 3 clock periods to divide up into setup and hold times.

The above scheme will allow the transfer of data between the two clock domains providing the command and data is always aligned to every fourth clock. If it is desired that the reading be able to happen on an arbitrary clock edge, then it is possible to achieve this generalization as well. The idea is to simply build four clock domain transfer circuits, each one running on a different phase of that divide by four circuit. This is only required for the phase selection part of the command, not for the whole 72-bit data transfer. The CFM sends the READ command (which basically says take the 72-bits of data and serialize them), to one of the four clock domain synchronization circuits. Which circuit is chosen depends on the phase of the incoming READ signal, with respect to a 2-bit counter in the CFM domain. The CTM circuitry samples the 72-bit data at a time determined by which of its four synchronization receivers got the signal.

It's an interesting interaction between timing and logic, one of the things that modern design tools aren't particularly good at. Which, of course, is why designers do fully synchronous (i.e. single clock domain) designs, when they are fast enough. I don't think that the clock domain transfer issues are what brought Rambus down. Instead, I think the problem is with the very tight timing and margins required on all those parts.

-- Carl
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext