SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : AMD/INTC/RMBS et ALL -- Ignore unavailable to you. Want to Upgrade?


To: wily who wrote (170)10/12/1999 8:56:00 PM
From: wily  Respond to of 271
 
eetimes.com

Microprocessor Forum: Designers cut fresh paths to parallelism
By Rick Merritt
EE Times
(10/08/99, 2:25 p.m. EDT)

SAN JOSE, Calif. ? The struggle to create high-performance processors is leading such giants as Compaq Computer Corp., Intel Corp. and IBM Corp. to adopt widely differing techniques in an increasingly complex search for parallelism. Each company's route has its own trade-offs, and each represents, in silicon, the space the designer hopes to occupy in the marketplace.

The companies mapped out their plans this week in separate presentations at the Microprocessor Forum here. Intel has staked out fresh ground in instruction-level parallelism (ILP) with its newly christened Itanium architecture ? formerly known as Merced ? which relies heavily on compiler technology and such new techniques as speculative processing. IBM took a hardware approach, using chip multiprocessing (CMP) by putting two cores on its Power4 processor. And Compaq took a unique tack to thread-level parallelism (TLP) with its EV-8 to create a single chip that acts like a virtual four-way symmetric multiprocessor.


"Ultimately, everybody will do all of it," said Michael Slater, principal analyst with MicroDesign Resources and host of the forum, but "initially they are all trying to solve different problems and have unique starting points."
Slater and other analysts said that while Intel's Itanium is expected to fare well when it debuts next year, the use of instruction-level parallelism may be coming to an end for new architectures as designers begin to see the advances in chip- and thread-level parallelism that IBM, Compaq and others are leveraging.

Indeed, keynoter John Hennessy, co-developer of the first commercial RISC chip and professor of electrical engineering and computer science at Stanford University, cited a looming transition away from ILP.

"These techniques are getting ever more complicated. I don't see any performance wall, but there are steeper slopes ahead," he said, noting the complexity of using techniques such as trace caching and value speculation. Bigger advances will come as designers embrace parallelism through multithreading, but that requires a significant transition, he added.

"We are entering a domain where designers need to employ multiple threads, and that requires software support," Hennessy said. "That means we have to help software guys think of new ways to deal with parallelism. It's time we get started on the process of moving to multithreaded software models."

Thread processing units

Taking a step in that direction, Compaq described its EV-8, an Alpha processor that can execute as many as eight instructions in a clock cycle. To exploit that potential fully, Compaq uses out-of-order execution and special fetching techniques to create four virtual "thread processing units" that make the CPU look like a four-way multiprocessing system to high-end server software ? such as versions of Oracle's database ? written for just such machines.

"It looks like four chips, performs like two and is actually one chip with about 5 percent more transistors than a non-multithreaded device," said Joel Emer, an EV-8 designer and senior consulting engineer at Compaq. "This is the biggest advance since RISC was proposed 20 years ago."

The EV-8 is not scheduled to be available as a processor until 2002 and may not ship in systems until 2003. Despite that, Emer already sees a road map for extending its brand of parallelism. A next generation could be based on a wider processor that doubles the number of virtual thread units to eight.

"When you get to the limit of how wide a machine you can create, then it's time to move to chip multiprocessing," he said.

Compaq's advance is focused on extending the company's prowess in the high-end Alpha server space, where multiprocessing-ready applications already exist. Outside that space, however, there is no such code, and to use the EV-8's virtual thread processors optimally, applications would have to be tuned to the realities of the underlying hardware resources that are shared among the virtual thread processors.

"Today in big servers, there's plenty of room for parallelism at the thread level," said Linley Gwennap, a consultant with MicroDesign Resources. "The real question is, as you move down to smaller servers, workstations and desktops, how much parallelism is there at the thread level?"

Preserving a high-margin business among top-tier servers was also the rationale behind IBM's foray into a new form of parallelism, in its case with chip-level multiprocessing. Because IBM was aiming at a class of servers that might sell from $3,000 to $3 million, it could afford the costs of putting two processors on a single large ? and sometimes hot ? die, according to Slater.

"IBM's Power4 could do things with pin count and power dissipation Intel would never try, because IBM felt bandwidth was the overriding issue," said Slater. "Intel, for its part, was trying to design a microprocessor that could be used by hundreds of OEMs for a wide range of systems, some of them ultimately selling at fairly low cost."

As chip makers ride out Moore's Law, however, Slater said CMP will become much more pervasive. Indeed, Advanced Micro Devices said it will put two cores on a future-generation X86 processor, and papers on such embedded processors as Sun's MAJC and Cradle's Universal Microsystem also described chips that use multiple processor cores on a die.

Just how far the approach can go is one question. "We will put two to eight processors on a die," said Marc Tremblay, chief designer of MAJC, but "beyond that we may need a new technology . . . Does all the research that has been done in mesh structures and hypercubes still apply when we have these extremely fast interconnects on-chip? This will be an important research area."

Intel eschewed radical new techniques such as multithreading and chip multiprocessing ? at least for the moment ? to make its own radical leap to a new instruction set architecture with Itanium. By adopting a fresh architecture, it hopes to leave open the door to another decade of wringing performance gains from instruction-level processing.

With Itanium, Intel wanted to make hardware scheduling decisions visible to the compiler, which could make its own optimizations in a manner not unlike what the first RISC processors attempted, Fred Pollack, a senior Intel designer, said in a panel session here. Intel also employed new techniques such as predication and speculation, or letting a processor anticipate and perform calculations before the need for the calculation or the validity of data can be fully checked.

Critics were quick to pounce on Intel's approach. The results of speculative executions are often tossed out when the data cannot be validated, several people noted. In addition, the practice of "profiling" code for Itanium processors creates an unnecessary software burden, IBM fellow Martin Hopkins said in a panel session here.

"We looked at predication and thought it would be great, but in terms of memory bandwidth it turned out it would hurt us as much as help us," said Michael Shebanow, chief technical officer at HAL Computer Systems, who detailed a 64-bit Sparc processor.

Like the move to multithreading, Intel's embrace of a new instruction set architecture has a significant cost in moving software developers to a new platform. Intel has already set up a $250 million fund for IA-64 software development.

Despite the skeptics, Intel is moving ahead with Itanium. Samples and working systems are already running in the labs, and system shipments are expected in the middle of next year. The architecture should issue a peak of six instructions per clock cycle, Intel said here.

Analysts are bullish on Itanium. "Intel is really ahead here because they will ship something next year," said Gwennap. "The other guys are being forced to disclose stuff far out into the future to seem competitive.

"Next year Compaq will still be shipping its EV-6, a four-year-old design; Hewlett-Packard is still shipping its old PA-8000 design; and Sun's Ultrasparc doesn't look that great either. Merced will look good in the field. The barrier is not that high."

And nothing prevents Intel, once it refines its Itanium core in the next-generation McKinley chip, from adopting techniques previewed by IBM and Compaq.

"Beyond McKinley, there's no reason Intel could not go to some form of multithreading or chip multiprocessing," said Gwennap. "A year or two ago Intel just didn't get thread-level processing at all, because they were so focused on instruction-level parallelism, but I think they are getting it now."