SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Intel Corporation (INTC) -- Ignore unavailable to you. Want to Upgrade?


To: Elmer who wrote (111436)9/27/2000 11:08:51 AM
From: pgerassi  Read Replies (1) | Respond to of 186894
 
Dear Elmer:

Since I have built applications with built-in RDBMS (just not as fancy), I know what the mix of instructions for such applications are. I have looked at the freeware Apache and Linux source (have you ever seen Linux?). Thus I know how that code is put together. Given a known instruction mix, I can reasonably estimate P4 performance. Given the design criteria and the way the pipeline is structured, P4 will do poorly at those types of tasks as too many pipeline stalls will occur. The trace cache is very poor at dynamic code (ie Java scripts, AI like code, and RDBMS engines which are chock full of conditional branches) and the fact that the x86 decoders are half the speed means that P4 will be really slow when the trace cache is thrashing. Given that the P4 will decode only 1 instruction every two cycles compared to Tbird that decodes up to three instructions every cycle, this is the bottleneck that will do in P4 for this type of compilcated code. The types of jobs that are very good with a trace cache are those that perform simple fixed operations on massive quantities of data just like those types of programs that DSPs, 3D Engines, and MPEG compression chips are optimized for.

P4 may have bottlenecks that do not show up on these second types of code further restricting performance. One oddity is that the trace cache only issues up to three micro ops per cycle but they have 2 dual ALU units or something like four units. Given this, the bottleneck would be that the trace cache does not issue the correct amounts of micro ops. The only way it would be balanced would be if the ALUs do mostly multi-cycle operations thus invalidating the double clocked ALUs. Perhaps the new respin will remove these oddities in the design.

Of course there is Intel's own actions and reading between the lines that provides the best proof that these suppositions are close to the truth. The hole between the highest clocked P3 and the lowest clocked P4 must have a reason and that all good P4s clock higher than 1.3G has never ever been used by Intel before, especially since the slower they go, the cooler they run, and the less power they use. The more likely reason (by far) is that the lowest grade P4 operates with the same overall performance as the highest shipping grade of P3 which is more expensive. The Intel prices confirm this notion. This is the same reason that Tbird outsells K75 at same speed.

Pete