To: Charles Gryba who wrote (93413 ) 2/21/2003 1:33:15 AM From: pgerassi Read Replies (2) | Respond to of 275872 Dear Charles: They went radical because they were getting beat doing standard stuff. The problem with radical is that it leads to many dead ends. Itanium and P4 are dead ends. Things like double clocked ALUs and trace caches may sound great but, are fraught with problems and they keep coming back to roost. Besides what Intel did is not truly radical as each thing they have done has been tried before and found to be a dead end but, do they listen? No, they were arrogant. They knew better. Well, they have dug themselves into a hole and they are trying to get back to the normal path with Centrino. Now AMD has this terriffic well balanced core. It is excellent on traditional code and can keep up with the gryotations of Intel's new direction codes. Why rework such a good backend RISC engine? So they look to modifications that do not change the basic core much yet deliver significant improvements. And they listen to cutting edge and leading developers on what they want in a new CPU. And then they try to add it in such a way as to grandfather existing code (deprecating little used stuff when needed) yet add those desires into the later cores. 3Dnow, PowerNow, exclusive cache, larger L1s, DDR, P2P EV6, more associativity and fully pipelined FPUs are some of the changes made in the past. Now they are adding 64 bit, integrated on die DDR DRAM controllers, HT, glueless ccNUMA MP and micro op fusion. Now Intel has used up the traditional ways of speeding up its CPUs, large caches, more levels of cache, wide cache transfers, super long pipelines, lower Vt, thinner oxides in gates, lower max temps and other such tricks. In the past they have tried radical things to eek out a few more MHz like the notched gates (better known as the botched gates process) which fail miserably. And they try other gee whiz features that later found to have shot themselves in the foot like VLIW with revisions. The theory was to put the optimization into the compiler and simplify the control logic in the CPU. So the hardware is easier to make and the compiler harder. Well the compiler was so fraught with problems (easily seen from those that have gone this way before) that changes were made into the hardware to make it easier for the compiler and that resulted in a lower performing complex CPU with a complex instruction set that was difficult to program and still the compiler isn't yet up to snuff. That is why the Itanium continues to be too slow and needs all the above tricks to stay with the performance evolution of more traditional CPUs. So Intel is not really trying radical things but, are just going too far for the current state of the art. If two more stages is reasonable, they try a dozen more. If ten more registers are reasonable, they add a hundred. The achielles heel of the P4 is it is totally unbalanced. The current bottleneck can be traced back to the narrow slow x86 decoder and the missing barrel shifter. Do they do the intelligent thing and fix these? No, that would be admitting their error in making it so narrow to begin with. They know better and that isn't it. It is just a fantastic new relayout with twice the L1 and L2 caches on a new smaller process with the SOI, we deprecated at first and we will add full depletion (to keep egg off our face) and strain the silicon to boot (it would not be fast enough otherwise to allow the P4 to keep up). It will go 5GHz (forgetting that it would just beat a 2.5GHz Opteron using 80% as fast DDR which can still be made on 130nm partially depleted SOI with relaxed silicon) and be the next great thing (well for us anyway). We won't mention that 90nm Opteron will out in the same time frame, clock 75% as fast as ours and yet deliver 50% more performance from a single core (forget that they will go dual core, its embarassing), far more memory, much lower power usage and needs no glue for up to 16 way ccNUMA systems. We will state tat we will in 7 years go 3 to 4 times as fast and go the multicore route (our single cores just won't be able to keep up). Intel is not executing on the architecture side (except for perhaps the Centrino and the jury is still out there). The Northwood to Prescott transition is more like the Thunderbird to Barton transition. It is a relayout and doubling of caches on a smaller process but, not as well done (it misses the leakage reduction of the Tbird to Palomino optimization). Pete