SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Intel Corporation (INTC) -- Ignore unavailable to you. Want to Upgrade?


To: Tenchusatsu who wrote (109154)9/1/2000 9:49:14 AM
From: Scumbria  Read Replies (2) | Respond to of 186894
 
Ten,

Some new P4 benchmarks on the Web. They show a 1.4GHz P4 beating a 1.0GHz Athlon by about 18%. (A linear speedup would be 40%.)

homepage1.nifty.com

This falls in line with my expectations of P4 running about 20% slower per clock than T-Bird. If AMD delivers a 1.5GHz Mustang w/DDR in Q1, P4 will have to run close to 2.0GHz to keep up.

Scumbria



To: Tenchusatsu who wrote (109154)9/1/2000 10:17:14 AM
From: Rob Young  Read Replies (2) | Respond to of 186894
 
Tench,

What you have to understand in all this is that Digital
spent a good deal of time investigating VLIW and rejected
it out of hand. He isn't tuning a fiddle and only time
will prove that. And yes IA64 is a "dumb machine"
because it is an in-order machine. He is being colorful
when he says that. That is not much of a stretch either.
It is a dumb machine. The compiler says "here play
this record" and the machine says "otay!" Whereas in
OOO it gets quite creative to prevent stalls. Sure you
guys have predication and speculation and Alpha has CMOV.

There's nothing new under the Sun.

And I think it is fair to say they have good evidence
to "prove" the "dumb machine" statement. Read the
section titled "IA64: a smart compiler and a dumb
machine" beginning on page 2:

alphapowered.com

Also, from that paper:

"In the early 1990s, we designed a VLIW version of Alpha
similar to IA64 [1,2,3,4,5,6]. During this process we
discovered that most of the compiler technology for a
VLIW processor COULD EQUALLY BE APPLIED TO A RISC
PROCESSOR, and by avoiding IA64-style extensions to Alpha,
we could also implement an out-of-order processor."

(My CAPS).

Now maybe you would have us all believe there is some
sort of "secret knowledge" only the mighty Intel/HP can
come up with... Prove it! You can't, and Paul DeMone is
also one of the few to show that the "Emperor has no
clothes"

But oh... just wait until it comes out blah blah blah.

Which quarter? Which year?

Rob



To: Tenchusatsu who wrote (109154)9/1/2000 10:55:44 AM
From: pgerassi  Read Replies (1) | Respond to of 186894
 
Dear Tench:

The smart compiler on a dumb machine refers to the design goal of EPIC (Explicit Parallel Instructions Computer). The Itanium is an in order processor. That is easier to do than a harder but smarter out of order processor. The explicit in EPIC means that the compiler schedules the tasks of each functional unit in the Itanium. This removes even more intelligence from the CPU and gives it to the compiler. Thus the compiler must optimize the instructions to each functional unit in order to get the maximum performance. This is why the compiler must be very smart. Thus the statement is accurate in comparison to P3, P4, K6, and Athlon.

Designing the CPU, Itanium, should be easy since it does not need to do out of order execution and schedule functional units. That logic is where much of the complexity of logic in the mainstream processor goes. Thus, it should be easy to design. The problem is that neither the hardware (how Intel missed the frequency is hard to understand) nor the compiler (this is much easier to see as this is typically the achilles heel of EPIC) is ready. The only way to have these problems is they are trying to put back some smarts into the CPU to help the compiler out.

The problems with EPIC are many. Here is a few:

1) Because the compiler must optimize for the functional units in a given EPIC CPU, every program must be recompiled for each different EPIC CPU. If one adds or deletes a functional unit or changes the latency of an operation, the program is no longer optimal. Many stalls will result in too few units and more units are not used at all. Doubling L1 or L2 cache would also produce this effect. Also, the memory subsystem must be recognized by the compiler as you must take this into account as well. Thus a recomile of all applications each time a change to the hardware configuration is performed is required for optimal performance. This will be hard given the way software side of things is architected (also why Linux and open source is a necessity for Itanium).

2) Since the optimization is done at compile time, if the assumptions made are wrong, there is currently no way to fix it. For example, if two consecutive runs invert the probabilities of most branches taken, and the first case was considered typical by the compiler, the second case will be frought with stalls, delays, bubbles, and run very slowly compared to the first run. Thus run times become very difficult to predict and fustrating for users. In addition, any real time processing that is required will have a difficult time in making sure that response times are within specifications. This is the the kinds of loads commonly found in applications such as web serving and heavily loaded multi-tasking computers.

3) Multi-tasking, and interrupt driven software, will cause disruptions to many optimizations performed by compilers for EPIC CPUs. They act like many mispredicted branches, cause major thrashes of the caches, and frequent pipeline stalls.

These tasks are performed by a majority of the servers that is the intended market of Itanium. The best market for EPIC CPUs is the embedded market as the problem can be very narrowly defined and does not change over time. The opposite of the things one expects from a general purpose CPU.

Pete