To: Daniel Schuh who wrote (14252 ) 11/18/1997 5:53:00 PM From: Justin Banks Respond to of 24154
Dan - While I've not got any urls (this one is close to the chest as of yet), lots of the IA64 stuff seems reminiscient of PlayDoh (HP), see :hpl.hp.com Note some of the players have moved to the IA64 design team. Here's some anon. stuff posted by someone who really knows chips : >> PlayDoh Almost everything I saw yesterday at Crawford's talk was in HP Labs' PlayDoh. While it was interesting in 1994, it didn't garner that much interest. (Of course in those days they didn't have their marketing dept invent a new four letter acronym for it!) It will get a lot of interest now for marketing reasons, but that just indicates that Intel/HP don't have a major technical innovation in IA-64 (at least in the little they presented so far). If they had, PlayDoh would have been have been picked up by others after it came out. See hpl.hp.com for ordering information. There is an abstract in hpl.hp.com The other major reference for things along these lines is Wen-mei Hwu's publications. >> In-order vs. Out-of-order HP/Intel seem to be setting themselves up for a in-order vs. out-of-order battle where they've staked out the in-order side. (While nothing is impossible, the predicated instructions in the style of IA-64 would not be any architect's first choice when building an out-of-order processor.) It is unclear that they've made the right choice. Generally it has been found in the past (e.g. branch prediction and load/store reordering) that there is some information that is not available at compile-time, and so it is possible to do better at execution-time in the processor. HP/Intel seem to be saying that they believe all such instances (even the ones they haven't thought of yet) can be overcome with ISA features. >> Predication vs. out-of-order Why is predication unfriendly to out-of-order? Because the concept of conditionally writing a register is antagonistic to register renaming. The way you would implement it if you had to is to read the destination register as an implicit source operand, so that you can write it to the renamed destination in case the predicate is false. So regular ALU instructions now have 4 source operands instead of 2 (2 real operands, the predicate, and the destination's old value). It is also the case that the predicated instruction will usually have to wait for other computations of the destination to finish before it can go (because they read the destination). This defeats much of the purpose of predication, since the then and the else arms are supposed to be in parallel, not serialized. One way around this would be to wait for predicate evaluation before doing register renaming, but I think that would have more problems than it solves. >> Predication vs. branches HP/Intel portray predication as an alternative to branches. The other RISC architectures have thought this is such a good idea that they've been doing it for years. Most of our compilers have doing "if-conversion" for some time now (since 1993 for MIPS). So what's the big deal? >> Speculation The big issue for speculating loads is not basic block barriers, but stores. Crawford did not mention this case, probably because it gets messier. Let's assume IA-64 is like PlayDoh on this. My memory says that the way PlayDoh handled this was to have the chk.s instruction trap if the ld.s came before a store to the same location. This required fixup code to rexecute not just the ld.s, but all its dependent instructions before the point of the chk.s. This could lead to a combinatorial explosion of fixup code the further back you try to move the ld.s. Now this code is to first order never executed, but I would still worry about the bloat and the inefficiency of its use when the situation does occur. (There was a reference to fixup code in the IA-64 slides I think.) >> Template fields What is the real value of the template fields? It seems to me that at best the template field saves one pipeline stage. And we could argue whether that pipe stage is before or after the Icache (i.e. it may be the hw could use predecode into the Icache to create its own template field). Now we always try to minimize the number of pipe stages in a processor, but this seems to be going a bit far. >> Summary So at best HP/Intel may have a way to build in-order processors with performance approaching out-of-order processors, and they are perhaps boxing themselves into a corner. There is no breakthroughs that I can see. Is there some advantage in only having to build an in-order processor? Yes, some. But increasingly the interesting work (and complexity) is elsewhere anyway (e.g. the memory system and coherency and MP synchronization). -justinb