SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Intel Corporation (INTC) -- Ignore unavailable to you. Want to Upgrade?


To: Joe NYC who wrote (158788)2/14/2002 3:22:48 PM
From: Tenchusatsu  Read Replies (2) | Respond to of 186894
 
Joe,

[WBMW] and speculated execution and predication will give enormously high branch prediction rates.

Actually, this is somewhat misleading. Predication doesn't improve the branch prediction rate. Instead, it helps to reduce the number of branches in the code in the first place. The fewer branches you have, the fewer branch stalls due to mispredicts.

As for speculation, I don't see how that relates at all to branch prediction. Speculation is basically prefetching done in the assembly code.

<I think McKinley has more execution and address calculation blocks than Merced. Are these explicitly addressed by the compiler, that is, does the compiler know how many there are>

Itanium is not a completely static architecture, unlike most versions of VLIW. This is a common misconception.

In this case, assembly instructions will still be arranged into bundles of three. The fetch unit takes each bundle, and the decode unit (or whatever) unpacks the instructions in each bundle. Then all of the unpacked instructions get sent to the various execution units.

The bundling of instructions helps to resolve some of the dependencies between instructions in a block of code. That simplifies the job of the front-end part of the Itanium pipeline. The back-end part (the execution units) resolves the rest of the dependencies and is actually able to execute instructions out-of-order, to an extent. That's how additional execution units in McKinley can help.

Tenchusatsu



To: Joe NYC who wrote (158788)2/14/2002 3:23:26 PM
From: wanna_bmw  Read Replies (1) | Respond to of 186894
 
Joe, Re: "While Itanium can perform well as a database server, where there is a lot of repetition/parallelism in sorting and comparisons, there is very little parallelism in business logic."

Define "business logic". In terms of software that small, medium, or large businesses use, there are many different genres that software can be divided into. Financial software, for example, can have a lot of parallelism in it. So can any software that produces graphs, vectors, or analytical output. Applications that create content, that refine images, sound, or multimedia, applications that render 3D, transform data, or display graphical formats can all have greater levels of extracted parallelism. I think you are underestimating just how many applications Itanium can be good at.

Re: "Is this a compiler generated execution and prediction? I didn't know Itanium had any built in capacity for speculative execution and prediction."

Check out this early presentation. It will explain a lot.

developer.intel.com

Re: "I think McKinley has more execution and address calculation blocks than Merced. Are these explicitly addressed by the compiler, that is, does the compiler know how many there are, or is the executable in sort of intermediate form, and the processor assembles the intermediate instructions to what gets actually executed?"

Unlike x86 code, IA-64 has the compiler dispatch data sets in terms of several predefined templates. See this slide.

developer.intel.com

McKinley has two more integer / memory execution blocks as Merced, but it does not add any more templates to the library, so the compiler does not need to be updated. In Merced, if there are no available execution blocks to service a given template, the data is suspended for a clock cycle until more resources are available. Adding more execution blocks decreases the probability of this happening. With two bundles filled with three symbols each, an IA-64 CPU can service as many as six instructions every clock cycle. To be fair, most software doesn't have enough parallelism to fill all six slots on every clock cycle; however, new compiler breakthroughs for Itanium architecture are appearing all the time, and each has new algorithms for keeping these filled.

"Or how else will existing code take advantage of additional resources provided by McKinley? Or does the software need to be recompiled for each processor?"

At MPF last fall, Intel mentioned that McKinley would offer 1.5x or greater performance gains over Merced by using the same binaries, and recompiled binaries will offer as much as 2x performance in certain situations. Clearly, there are some features that only improve performance in specifically optimized code, but it seems like there will still be plenty of performance, even without having to recompile.

wbmw