wbmw,
IA-64 was designed to extract more parallelism from code
I think it is the compiler that is supposed to extract parallelism, but there is a limit to how much parallelism there is. There are dependencies, branches, loops.
While Itanium can perform well as a database server, where there is a lot of repetition/parallelism in sorting and comparisons, there is very little parallelism in business logic.
and speculated execution and predication will give enormously high branch prediction rates.
Is this a compiler generated execution and prediction? I didn't know Itanium had any built in capacity for speculative execution and prediction.
Additionally, McKinley will have 6 parallel integer / memory execution blocks, which should offer far more throughput than any current x86 processor.
I don't know how much these will contribute. In x86 world additional execution blocks offer less and less marginal gains. Theoretically, I guess compiler can take advantage of all of them.
BTW, I think McKinley has more execution and address calculation blocks than Merced. Are these explicitly addressed by the compiler, that is, does the compiler know how many there are, or is the executable in sort of intermediate form, and the processor assembles the intermediate instructions to what gets actually executed?
Or how else will existing code take advantage of additional resources provided by McKinley? Or does the software need to be recompiled for each processor?
Joe |