To: Joey Smith who wrote (108250 ) 8/24/2000 5:28:36 PM From: pgerassi Read Replies (1) | Respond to of 186894 Dear Joey: A launch director is not a designer. A mis-prefetch is an oxymoron. Intel ran a supposed air cooled part for a few seconds (1 to 5) on a hand picked part with a simple clock utility. You can run a Tbird with no heatsink for a few seconds at 1000 MHz. But no one, but a fool, would try to boot a computer up that way. Did they open the case, show it encased in plexiglass, or better yet uncased and show you what cooling method they actually used? The "ordinary" air cooled part looked like a 80mm fan on top of a one pound heatsink. You can throw in a peltier element or two and still call it air cooled. Heck, a Kryotech case can still be called air cooled (the fact that the condensor rather than the CPU is air cooled does not matter). And enough of them exist to not call it special anymore either. All the demo was is a staged act anyway. Ace did not have a P4 to bench either. His trace cache micro op size is incorrect because he forgot that these are RISC ops and RISC instructions are typically at least as big as the word size plus they need an address to make make sure they refer to the right decoded ops. Thus, a minimum of 8 bytes and more like 10 or more yields somewhere between 96 to 192KB in size. Even Intel admits that the IPC of the P4 is less than that of the P6 (and the P6 is less than that of the Athlon). Since there is no cache in front of the decoders, an additional L2 latency must be added to the decoding pipeline on a trace cache miss which is far more likely when there is a branch misprediction so a miss could easily exceed 30 cycles. Once you go through Johan's calculations, they come out and say that the IPC is definitely lower than that of the P3 directly opposing his own conclusions. There is quite a few negative surprises, "So this seems to be a small mystery.", "P4 will suffer up to 50% more from branch misprediction.", "shift and multiplies are not" (handled quickly), "P4 has a better FPU than the P6!! Why? The P4 offers much more bandwidth via the L2-cache and the 400 MHz FSB than the P6 does" (bandwidth in the fetch does not belong to the FPU), and "Intel reported that the Sysmark Windows Media encoder test reported 50% higher numbers on the P4 1.4 GHz chip (i850 chipset) than the 1 GHz P3 (i820 chipset). Both systems contained RDRAM" (why did they not want to compare it to 815E or 840?). All of these inconsistencies show that Johan does not know about internal CPU architectures and how performance is measured. BTW, Tbird has 65% better throughput on PC1600 than PC133 (or 120% better than PC100) by independent benchmarks (see PCwatch) on the same CPU so, the 150% result above is less than a simple chipset change even with a 140% of clock. Thus, Johan is definitely wrong, that was no CPU designer (Intel or otherwise), and Intel did not run at a sustained 2 GHz. Pete