To: Elmer who wrote (118804 ) 11/22/2000 2:36:01 PM From: pgerassi Read Replies (3) | Respond to of 186894 Dear Elmer: When one talks of idiots, one should really take a hard look at oneself. The P4 implementation of x87 code is really bad. Even looking at the new Sandra benchmarks with SSE2 optimization reports x87 speed at the same time as SSE2 speed. The x87 speed is about that of a top end Celeron. Without SSE2, P4 is dog meat wrt other CPUs. X87 code is used for the increased accuracy and delayed loss of significance due to the 80 bit temporary real data type by many applications and almost all applications needing something more than 32 bit floats. The first will not use SSE and the second will not use it due to the lag in compiler and vendor upgrades. Intel, by the lousy performance, has pretty much handed that market to AMD. Intel has also handed the encryption and communications markets (heavy users of LSMs and bitwise operations) to AMD as well by eliminating the barrel shifter from P4. Since barrel shifters do not limit performance, the removal could only be for die area reasons. Ditto for the combination of the x87 multiplier pipeline and addition pipeline into one pipeline. Both of these design decisions have haunted and will continue to haunt Intel wrt P4. The one reason for buying a PC is that the main processing unit is general purpose in nature. It does many things well instead of a few things great. Those tasks needing greatness are placed in special purpose hardware. Graphics cards are taking over the large loads of geometry processing and MPEG compression and decompression. Sound cards are taking over the large loads of MP3 compression, MP3 decompression, 3D environment modeling, and voice to syllable conversion. These are offloading the CPU for those tasks that are done frequently enough to make speciallized hardware cheap and profitable. What is not being moved over from the CPU is tasks that are complex in nature like AI, web serving, parsing, decision support systems, modeling, and scheduling. All of these typically are branch heavy, have large working code set sizes, and high processing to bandwidth ratios. For these future GP CPU applications that will make an increasing proportion of the processing work load, the P4 does very poorly. The one area where the P4 is competent is the large data set small code set programs. These are those that do very little decision making and thus fit comfortably in the trace cache and the few branches cause very little pipeline stalling. The P4 does get the data in and out fast but, this advantage can disappear rapidly. All in all, the P4 is not the "barn burner" Intel promised but, it is more like the false fronts used in the motion picture business in that it looks good only for certain angles (like the one room cabin with a front attached to make it look like a mansion). Pete