Next Cyrix core aims at 600-MHz Pentium II performance level -- Hot CMOS, deep superscalar spice Jalapeno
Oct. 09, 1998 (Electronic Engineering Times - CMP via COMTEX) -- Dallas - Armed with a hot CMOS process from parent National Semiconductor Corp. and an aggressive superscalar architecture, Cyrix is taking aim at the high end of Intel's IA-32 processor line. The company is planning to pop its next hot CPU core, code-named Jalapeno, just in time to catch Intel in midtransition from the IA-32 to Merced.
"We are aiming Jalapeno at the 600-MHz Pentium-II performance level," said Mark Bluhm, Cyrix vice president of engineering. "That will be much faster than any estimates we have seen of Merced speed on IA-32 code, and it should be competitive with Intel's IA-32 high end at the time."
According to the company, Jalapeno represents both solid engineering developments on existing themes and some significant departures from Cyrix's traditional way of doing things.
In its traditional approach, Cyrix prided itself on high instructions per clock, trying to execute most X86 instructions in a single cycle. The architects argued that this higher efficiency would lead to better overall performance, even if it caused circuit complexities that limited maximum clock frequency.
The company won the battle, demonstrating benchmark performance on its M-II processor that exceeded the performance of higher-frequency Intel CPUs.
But at the same time, Cyrix was losing the war. "The reality is that retail end users don't buy on performance-they buy on clock frequency," stated Stan Swearingen, Cyrix vice president for desktop products. "The irony is that because we had done a better job on core design, we were getting beaten on the retail shelf."
This experience led to a new way of thinking. "Before, we would compromise on maximum clock rate to get better execution speed," Swearingen said. "Starting with Jalapeno, we decided to put the priority on megahertz, and not to trade them easily for higher performance." Core checklist
The new approach guided the Jalapeno team through a checklist of improvements over the current M-II core. "There were things to do to make the core run faster," Bluhm said, "like larger caches, more efficient pipelines and more execution units."
The latter capability conceals some surprises that Cyrix is saving for its paper at this year's Microprocessor Forum, Bluhm said. "In general, I don't think it's reasonable to issue more than a couple or three instructions per clock in any general-purpose instruction set-especially the X86 instruction set," Bluhm said. But it turns out that you can definitely benefit from having more that a couple of execution units in the core."
The apparent contradiction may stem from Jalapeno's decision to tune for megahertz. In order to get the highest possible clock rate, some critical paths have to be shortened. That, in turn, makes it impossible to keep some complex-and infrequent-operations to a single cycle. The architect has the choice of extending the pipeline to give the operation more cycles in which to complete, or of permitting the operation to stall the pipeline for a cycle or two.
If the architect takes the latter choice-and many feel that shorter pipelines are better pipelines-then additional execution units could in fact increase throughput. If a floating-point operation stalled the FP pipe for a cycle, for instance, consecutive FP instructions would stall the processor. But adding another FP pipe-even one that didn't support all possible FP instructions-could prevent the dispatch stall.
Another potential consumer of execution units is SIMD processing. Jalapeno will have to face Intel processors equipped with the Katmai New Instructions (KNI), Intel's second-generation version of MMX. Cyrix is in the process of deciding whether it will remain with the AMD 3DNow instruction-set extensions or move to an implementation of KNI.
"We've heard from some ISVs that KNI is superior to 3DNow," Swearingen said. "There's discussion on whether AMD intends to evolve 3DNow in that direction, and whether we might go there independently. One issue is that KNI may use 128-bit registers and some other things that you can't just adapt to with some simple pipeline changes. We are still looking into that."
Whatever the decision on execution units, Jalapeno will clearly require massive amounts of memory bandwidth to keep it fed. In part, this need will be met by more and larger on-chip caches. While he would not be specific about Jalapeno's configuration, Bluhm said that large L2 caches, and even on-chip L3s, may be in order as processors move beyond 500 MHz. Such cache organizations have in the past been used in the Alpha CPU family to good effect, even though they can increase die size enormously.
There will also be a need for high main-memory speed, both in terms of bandwidth and latency. That means a fast CPU interface to the Northbridge, and a fast main-memory system.
Until recently, the Northbridge presented a dilemma. Clearly Socket 7 was running out of steam. But with no license for Intel's Slot-1 technology, and given the high cost of that interface, Cyrix appeared forced into a proprietary bus-the kiss of death in the X86 business.
But Intel has opened a fourth alternative, said Swearingen. "With the advent of Socket 370, everything changes. Here is an interface that is inexpensive and implementable. And because it uses GTL-type levels, it can go 133 MHz or more-much faster than Intel is pushing it."
The interface issue may be moot, since Cyrix apparently plans to integrate the Northbridge into Jalapeno. That gives the architecture further advantages. Eliminating the bus crossing between the last cache and the DRAM controller can gain not only reduced latency but also the ability to do speculative and anticipatory operations that wouldn't make sense in a conventional partitioning.
Beyond the Northbridge, the need for speed persists. "You have to architect a system solution, not just a CPU core," Swearingen said. "Jalapeno will use a Rambus memory interface. That just makes sense for the time in which it is coming out-right as Rambus will be crossing over PC-100 memory buses."
All of this activity will require competitive processes. "In the past, National has lagged behind in CMOS development," Bluhm admitted, "but they are working hard to catch up."
Said Swearingen: "We are seeing speeds from National's CMOS-8 process in South Portland, Maine, that are right on top of what we were seeing from IBM's CMOS-6S-2. And we have parts in hand from a 0.18-micron process that is in development."
|