SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Intel Corporation (INTC) -- Ignore unavailable to you. Want to Upgrade?


To: John F. Dowd who wrote (163253)3/30/2002 1:05:33 PM
From: Dan3  Read Replies (1) | Respond to of 186894
 
Re: You sound like Robert McNamara trying to build the fighter for all missions...

John, where were you while Intel and Microsoft demonstrated the natural monopoly that evolves from the installed base?

How's Alpha doing? How's OS/2 doing? How's PA Risc doing? How's the Amiga doing? How's Itanium doing? How's SG's Irix doing? How's Wang doing?

Each of these solutions was superior, for its targeted niche, to the mainstream solution that constituted the installed base. But look at what happened to them.



To: John F. Dowd who wrote (163253)3/30/2002 1:35:22 PM
From: Dan3  Read Replies (1) | Respond to of 186894
 
Re: if you are talking about supplying a chip set with a cpu then INTC will be the clear leader

There's a bit more to it than that. No matter how great a job Intel engineers like Tenchusatsu, etc. do executing on the chipset, Xeon is still handicapped by the basic architecture decreed by the Intel pointy headed bosses that demanded Rambus:

There are a number of improvements due to x86-64 [Hammer}.

1) The addition of 8 new GPRs allows more registers to be used for temporary data and intermediate results instead of memory which is found in L1 which takes 6 cycles additional to store/retrieve and can only be started 2 per cycle. Registers take only 1 cycle and 9 can be obtained per cycle (6 for the 3 integer ALUs and 3 for the 3 AGUs). This will lead to about 10-20% improvement when compiled to take advantage of it.

2) The above also causes a secondary effect due to the additional GPRs force more registers to be in the virtual (reorder) pool. This increases the reorder window causing less stalls and more performance. This will add about 1-2% to IA-32 code.

3) The addition of 8 new SSE/2 128bit registers. This does the same for floating point as the 8 GPRs do for integer performance. It also increases the virtual pool for floating point as in point 2. Performance increases here will be between 20 to 40% with special cases over 100% (3x3 matrix multiplication).

4) 64 bit addressing (in Hammer family implementation, 40 bit physical and 48 bit virtual). This far exceeds any IA-32 bit CPU in memory, 256 times more memory and 4096 times more virtual memory than Xeon. Much larger simulation problems and other workstation and supercomputer tasks can be done. Also large databases routinely exceed 100GB and some exceed 1TB which standard rules of thumb require between 10 and 100GB of main memory plus program code and data requirements above that. Xeons and their IA-32 bretheren can't use that much memory so their database performance falls off in the larger uses. In addition, OSes like Linux use unused (idle) memory to cache disk (compared to memory, disk is very slow (40x bandwidth and 100x access)). With the larger memory footprints, it is conceivable that future systems may do it all in memory rather than R/W from disk. This is the reason solid state disks are popular with the large DB super servers. And your IA-32 bit programs do not need to be recompiled to take advantage of this. The OS, like Linux, will do this if it is compiled for x86-64 "Long" (64 bit) mode. So any application that uses disk will be much faster. Just think, those games with long load times due to the amount of reading the CD (or later DVD) will load in a tenth or a hundredth of the time. Level switches under 1 second give a more seemless run and gun time for those FPS fanatics out there.

Some additional benefits accrue to changes in the implementation of the Hammer family of x86-64:

5) The x86 decoders have three full decoders versus the 1 full and 2 simple decoders of the Athlon. This will allow more IPC even in IA-32 code. This may increase performance 5 to 10%.

6) A new stage will attempt to combine micro ops into fewer micro ops to increase the number of x86(-64) instrcuctions that can be scheduled at the same time. This may increase performance 2 to 4%.

7) Additional TLB size will allow less memory references to be used in virtual memory mapping adding 2 to 4% more performance.

8) Better address prediction will eliminate some stalls and improve performance another 2 to 4%.

9) Some branches may have both sides taken in speculative execution which means that there will be fewer stalls and improve performance between 1 and 5% depending on how often this is done.

10) On die DRAM controller. This will shave cycles off of DRAM latency improving effective bandwidth and shortening latency by tens of cycles. This will be a large boost in performance of 10 to 20% depending on application.

11) HT links between CPUs for "glueless" SMP. This makes the Hammers scale much closer to 1 to 1 with the number of CPU increases. Most SMP boxes get 50 to 80% increase with the second CPU, 25 to 50% of a CPU for the third CPU and 10 to 25% for the 4th CPU. Hammer will add probably 95% for the second CPU, 90% for the third and 85% for the fourth. This adds up to 1.85 to 2.55 CPUs for a quad SMP box [Xeon] for FSB based systems versus 3.7 CPUs for Hammer. This is due to each CPU having local memory and can get remote memory in about 140ns versus 300ns or more, if any possible, for chipset based.

12) Multiple I/O HT links possible to have more devices attached. One single CPU clawhammer could have an 8x AGP slot, 2 3 slot PCI-X busses, 5 slot PCI bus and all of those other SB based peripherals attached. This is far largger than most x86 servers by other makers including Intel. Even dual Clawhammer SMP and even more so for quad or octal Sledgehammer SMP boards.

Overall this allows even with current 1GB registered DDR DIMMs, Sledgehammer could have as much as 64GB of memory (there are some 2GB DIMMs showing up so 128GB is just around the corner) with about 42GB/sec total memory speed and 51.2GB/sec total HT I/O speed given 8 Sledgehammer CPU dies. This is far beyond any current x86 based systems or even all, but exotic supercomputer platforms.

All of this yields some large improvements in x86 (IA-32) performance, more with a 64 bit x86-64 OS and even more with 64 bit compiled applications. The later may get more than 50% improvement overall compared to an equally clocked Tbred and may be faster than a double clocked P4 NW. For those huge dataset problems, P4 NW will be left in the dust focing Intel to add x86-64 to their CPUs which are rumored to be in the works.

Pete aceshardware.com