Intel Investors - A Glimpse at Merced & McKinley Architectures - from Intel's Hans Mulder.
The description of the FOUR Floating point units on Merced - 2 EXTENDED Precision and 2 STANDARD Precision - suggest that MERCED may find a more rapid acceptance as a POWERFUL WORKSTATION CPU - before the big SERVER ramp up.
If that turns out to be true, Intel may be able to ramp up production on Merced faster since Workstation applications - and purchases - can be made on an as-needed basis without regard to corporations making life-or-death decisions on Merced-based Enterprise Server purchases.
On chip L0 and L1 caches are also part of the Merced design, with OFF chip L2 caches. McKinley will have on-chip L0, L1 and L2 caches.
Paul
{================================}
techweb.com
February 15, 1999, Issue: 1048 Section: Systems & Software
Intel's Mulder mulls IA-64 Alexander Wolfe
Dallas - Hans Mulder, principal engineer at Intel Corp. and-more importantly-one of the first at the company to recognize the potential of instruction-level parallelism, offered his take on Merced at the Micro-31 conference, held here last December. (Mulder is also a co-inventor of Patent No. 5,860,017; see story at left.)
The original team that took up the task of defining the IA-64 architecture consisted of five engineers each from Hewlett-Packard Co. and from Intel. "We basically wrote the book that contains the IA-64 instruction set," Mulder said during a luncheon keynote speech at the conference.
In 1991, Mulder said, he was the only one at Intel working full time to study the potential of 64-bit architectures. Now, more than 1,000 people are involved in the massive IA-64 project.
Nevertheless, Mulder was highly circumspect during his talk, giving a nod to his constricted position by jokingly characterizing Merced as the most talked-about piece of vaporware he'd ever seen.
On the technical side, he noted that Merced will contain full IA-32 binary compatibility in hardware. In addition, it will have a massive floating-point unit with two extended-precision multiply/accumulate (FMAC) units and two standard-precision FMACs. That complement will be able to execute up to eight standard-precision floating-point operations per instruction cycle and up to four extended-precision operations.
Merced will have three levels of caches. Most interesting are its separate L0-level instruction and data caches with a latency of two cycles. L0 caches are a recent architectural trend, intended to make frequently used code more accessible to the CPU on a slightly faster basis than standard-issue L1 caches.
Merced's successor, code-named McKinley, will have its L2 cache on-die. However, Merced will be fielded as a multi-die cartridge containing custom SRAMs. One reason for the setup is the tricky task of meeting Merced's power requirements while simultaneously moving to the new physical cartridge in which it will be housed.
Tough crowd
Mulder let down his hair a bit during a sometimes raucous question-and-answer session that took place following dessert and coffee. One attendee pointedly asked him to explain the difference between very long-instruction-word architectures and EPIC (explicitly parallel instruction computing), which is Intel's characterization of IA-64's operation.
The question elicited a roar of laughter from the audience. However, Mulder didn't take the bait. "That's a good one," he answered soberly. "EPIC is a combination of techniques that goes beyond VLIW-particularly the speculation and predication mechanisms.
"I think there's a set of techniques that, if you bring them all together, you can call them EPIC. It's really to signify that it's [something] more. It's not just providing fixed-width machines, as in VLIW. It also give you a lot of features that have never been used for scalar processing."
Beyond the purely technical, Mulder admitted that the EPIC name was part of a marketing campaign. "Now it's an accepted acronym," he said. "Maybe not yet in the academic world, but clearly in the trade press."
Copyright ® 1999 CMP Media Inc February 15, 1999, Issue: 1048 Section: Systems & Software
Overloading on Merced Alexander Wolfe
When it comes to reporting on Merced, sometimes there can be too much of a good thing. Take Hans Mulder, Intel principal engineer, who made a lot of interesting points in his keynote talk at the recent Micro-31 conference. They wouldn't all fit into the story that's running on page 43 of this issue. So I'm going to continue Mulder's mullings here.
Most interesting was the discussion, during the question-and-answer session following Mulder's talk, of Merced successor McKinley. The latter processor is due in late 2001. Mulder was asked how McKinley would support higher levels of instruction-level parallelism than Merced.
"The key is that we've added a few more instruction units," he explained. "Now, the interesting thing is, that doesn't necessarily mean that we'll execute wider instruction bundles [i.e., longer words]. Because it's also possible to make better use of the existing instruction templates-you can feed more functional units off of them."
McKinley opens up a can of worms, because VLIW-like (and presumably EPIC) architectures have had difficulty maintaining software compatibility as they add more functional units. Mulder said that won't be a problem, because IA-64 is defined in such a way that "you're always binary-compatible." Presumably, this will require a heavy dose of dynamic scheduling.
In that regard, Mulder also gave a nod to the importance of software in making IA-64 a success. "It's very clear that the basic philosophy is that compilers provide better performance by finding parallelism in the software."
Returning to his main theme, which seemed to be that IA-64 takes the best from all architectural styles to deliver the best to all possible users, Mulder made some closing remarks. "The key aspect of IA-64 is that it will help you with scalar code," he said. "The reason it will do so well on scientific code is because we threw all those functional units in. But [our use] of predication and speculation will help greatly with scalar code."
(Predication removes unnecessary branches from an application program, while speculation masks memory latency by executing load instructions as soon as possible.)
Asked if there were other mechanisms besides predication and speculation that would help with instruction prefetching in IA-64, Mulder had a succinct answer. "Yes," he said.
Copyright ® 1999 CMP Media Inc. |