SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Rambus (RMBS) - Eagle or Penguin
RMBS 95.57+0.7%Nov 28 9:30 AM EST

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: John Walliker who wrote (30695)9/26/1999 9:06:00 AM
From: Bilow  Read Replies (2) of 93625
 
Hi John Walliker; Re your comments of previous post...

No, it isn't some logic problem in the chip set. Those are the easiest problems in the world to catch, they don't wait up till the last minute. I've designed and debugged memory controllers for 18 years, and I hereby stake my reputation on the fact that this is not a simple logic problem.

A possible cause (in the controller) that I would believe would be something having to do with the inherently long delays to the last few chips on a rambus channel. When data gets transferred between two clock domains, there is always the likelihood of screwing the design up. It is possible that their clock domain transfer logic doesn't work when the two clocks are too widely different. By the way, they use a DLL, so I would run the following test. If the above is the cause of the problem, then the problem will probably be worse at higher supply voltages to the controller. (Of course any number of problems would also have that signature, but it is a fairly rare kind of problem. Usually systems have problems running at low supply voltage.)

Of course the problem can't be RMBS's fault. Or could it? Normally it is impossible to blame the company that writes the specification, only the engineers who turn out to be unable to achieve it, but this is not always the way it works. Sometimes specifications have subtle failings...

Regarding your comment on connectors, you noted that PC100 has more signals than Rambus, and so would have "probably little difference." This is wrong.

Suppose a connector ended up with a bit of filth that happened to cause an increase in resistance of 5 ohms. That will not affect the PC100 system, in fact, it may run even better. But it will completely throw the Rambus system into the weeds. The reason for this, is that the Rambus system has to be impedance matched, the PC100 does not. Basically, traditional, boring, slow, uninteresting, signal logic levels are reliable, profitable, and robust. The Rambus logic system is unreliable, expensive, and sensitive.

The reason for using such low impedance lines on rambus channels is to minimize signal pickup, I think, but it is a little outside my area of specialty. The principle, I think, is that if you get enough electrons going in the same direction down as fat a wire as you can, you can be pretty sure that its going to get where you want it to. That is why the Rambus channel is a power hog.

By the way, I remember a time when many more connectors were gold plated than currently. I don't know why they phased them out over the years, I suppose it must have been because connector wiper technology improved to the point where copper was reliable enough.

I agree about ECC. The truly amazing thing is how hard it is to discover errors in systems that don't use ECC. It turns out that the vast majority of the data that a computer manipulates just doesn't matter much, in terms of results to the user. Maybe you end up changing a pixel from having a color of (23,45,F7) to (22,45,F7), but nobody can see the difference anyway.

There was a bestseller recently, A Perfect Storm, which is soon to be made into a movie. The author talks about waves so big that they break when they roll over the edge of the continental shelf, a great read. But the reason it comes to mind to me, is his talking about rogue waves. These are waves that are much, much larger than the average wave. Ships can be designed to survive the vast majority of waves in the worst storms, but that still means that every now and again, such a ship will get plastered somewhere.

The big problem with sending data at very high rates of speed is that in order to do it, you have to reduce the margins involved. There are noise voltages constantly moving up and down on traces, somewhat similar to the ones on the ocean. Every now and then, a rogue wave comes along and changes a one to a zero. You really can't avoid having these happen. What you can do, instead, is to build so much margin into the system, that only the roguest of the rogue events will break it.

If you can design a system so that it only fails once per 10^1000 cycles, then you can be quite certain that it will never break on this earth, no matter how many you build, and no longer how long anybody uses them. This, in fact, is one of the principles that secretly supports engineering. (Shhh! Don't let the customers find out! If they buy one machine for every electron in the universe, and operate them until half the protons in them have decayed, then it is likely that as many as three errors will have ruined their calculation!)

To put it another way, since we are creatures built of blood, mucus and feces, and since we live very short lives of abject poverty and terror (compared to God), we don't have to build equipment that has the crystalline perfection of mathematics. In fact, we can't, and if we could, we'd just break it with our clumsy fingers and dirty flakes of skin that constantly flake off of us...

A secret of engineering is to accept this imperfection, and always build sufficient excess into our systems, but only enough excess to achieve success in a high enough percentage of cases, that it appears to be perfection. (We don't have to fool God, we only have to fool the other mortals, and only for a little while. They'll be dead soon, just like us.)

-- Carl

P.S. Incidentally, engineers do calculate failure rates for transfers of data from one clock domain to another independent clock domain. There is always the chance that data will change at just the right time, and put a flip-flop in the receiving domain into a "metastable" state. These states resolve themselves into either a zero or a one after a time that follows an exponential distribution. There are equations that cover this, and the manufacturers usually give you the metastability parameters for the flip-flops in their process. If you can arrange to not sample the flip-flop for a long enough time, (usually waiting 2 or 10ns is enough), you can get the failure rate down to only one in 10^10 years or so, depending on the clock and data rates, and engineers will generally build systems with such low failure rates. And, in the unlikely event that one actually does fail in the field, you can always blame the keypunch order entry clerk, a technician or perhaps a particularly lucky cosmic ray.
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext