SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Intel Corporation (INTC)
INTC 45.51+10.7%Jan 9 9:30 AM EST

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: fingolfen who wrote (142228)8/27/2001 9:02:35 PM
From: pgerassi  Read Replies (1) of 186894
 
Fingolfen:

You still fail in reading comprehension! Where is it that could be a problem and is a problem are the same thing? You assumed that I was stating that they absolutely have a problem. I did not and you didn't like what I said. So instead of sitting down and reading what William Henning wrote (you did quote the relevant section at least), you failed to see that the symptoms were of the same kind as the P3-1.13 problem. Code that runs on a slower bin works fine but, fails at the higher speed.

Now that could be (this does not mean must be) one of four probable causes:

1) There is an overflow or underflow during some timing loop ala MS Windows wrt to high speed CPUs. I remember all of the hubbaloo wrt to the K6-2 speed grade and Windows boot problems traced to this cause.
2) There is a timing issue in the CPU that fails to properly handle the sequence of instructions and causes an eventual error. This can be hard to find even when you know there is a problem. This caused the P3-1.13 0.18u aluminum demise.
3) There is a problem on a newer stepping of P4s to get 2GHz on 0.18u aluminum. The older P4 1.5 did not have this bug since it was a good stepping.
4) This was just a bad P4 CPU that somehow slipped through the verification suite or broke before Henning got it. This was suspected with Tom's CPU until another site verified that they noticed it and was waiting for Intel to figure out what happened.

There could be a more esoteric bug somewhere, unless it clearly is a timing loop (a dump when the program fails can trace what error is encountered), it usually is not software. I remember a case where a cache problem caused hardware and software to blame each other for six months till the customer got fed up. I found the bad cache after being asked to look in to it in about 2 hours which just shows what a new perspective can do to solve a problem. There was some shame faced hardware engineers that day. No need to say what the software engineers felt like.

You forget how long it took Intel to admit to the FDIV problem. Intel did not solve that timing problem wrt P3-1.13 on 0.18u aluminum for 9 months. They did use 0.13u copper to solve it. I guess the older process was pushed too far! Also, Intel has failed to get their compiler to compile Scimark yet. It has been over a year already!

As to the driver problem, the problem was slow running, not failure to run. I think that a failure to run is a bigger problem than running slowly. Unless there is a software timer in a driver (very unlikely as interrupts and clock ticks are easy to obtain in a driver as it runs in kernel space), to much speed usually does not affect a driver. It is usually the opposite of being not enough speed to do something fast enough. Without knowing which benches failed to run, it is difficult to track down a possible cause. Anything that is non temporal dependent problem would have failed for the P4 1.5GHz CPU. Thus, it must be temporal dependent problem (this could be code not thought to be dependent by the designer but, how many of us have fallen into the trap of CPUs doing what we say and not what we meant?).

The first thing we definitely need is to know which benchmarks failed and how. The next is to try a different 2G P4 CPU. If it fails, then Intel has a bigger problem. If it does not, the failing CPU should be placed in that platform. If the problem can be duplicated with another P4 CPU of the same speed, Intel at least has a verification problem (JPEG style). If the problem covers multiple MBs, OSes, and drivers, Intel has a really big problem.

That is a lot of ifs but, troubleshooting's goal is to determine the boundaries of the problem and simplify it to show pass/fail of a small easy to duplicate test. Once the symptoms are well known and bound, the causes should be relatively easy to find and the extent of the problem is determined by this time. If its only one CPU on a very narrow boundary, the fix is easy, replace that CPU. If it affects a large percentage of CPUs and the bounds are large, a JPEG style problem occurs. If it is all CPUs and a lot of different code and environments, a FDIV or P3-1.13 problem is in the works. Until the troubleshooting procedures are done, we just do not know!

As to jumping to a conclusion, you speculate a lot based on flimsier evidence. Besides, I had a lot of ifs and their synonyms but, you appear to simply ignore those. That is your problem, not mine. You assumed a lot without checking it.

Again, to simplify the original post for those who have trouble reading a head's up message, the possible problem has the following information:

1) Henning has some benchmarks that he has been using in all of his CPU tests for a long time.
2) Some of these long and oft used benchmarks fail on a 2GHz P4 on a i850 motherboard.
3) These benchmarks all work on a 1.5GHz P4 on various i850 MBs.
4) These benchmarks have all worked on all previous x86 CPUs tested.
5) He speculated that it may be a driver issue but could not track it down.

I said this fits the symptoms of the P3-1.13 problem noted by Tom. The faster CPU fails but the slower one has no problems compiling the kernel (how could Intel miss testing this often done task?). I then speculate that this could be,

1) A small problem of a bad CPU.
2) A software timing problem.
3) A larger problem of the same size as the P3-1.13G one (same as the last CPU release of a given process being too far).
4) A huge problem affecting far more CPUs than the 200 of the above problem.

Note, two of his synthetic benchmarks from a 1.33GHz Tbird review that were missing in the 2G P4 review are the Final Reality Benchmark and the 3DMark 99. Since both were missing, I believe that the Final Reality Benchmark would be a larger problem but, 3DMark 99 is one that Intel should have tested (heck, they should have tested both of them). The other benchmarks are games, Quake 2, Turok, Incoming and Forsaken. If the sample 2GHz P4s fails to run any of these, either the sample P4 CPU is bad or Intel could be in a world of hurt (how could they miss testing these games?). All six of these should be able to be run on any good P4.

Pete
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext