we may not have seen this...
January 19, 1998, Issue: 989 Section: News Feature
------------------------------------------------------------------------ Interfaces could help performance of DSPs, ASICs -- Fast SRAMs harbor riches still untapped
Ron Wilson
The evolution of fast SRAMs has come to a significant juncture. Microprocessor designers have encouraged development of a wealth of high-speed SRAM interfaces, but these interfaces are currently confined to use with L2 caches for processor architectures. If these rich interfaces trickle down into the commodity SRAM space-an evolution that's still far from certain-they could buy system designers new options for squeezing greater performance out of digital signal processors or their own ASIC designs.
SRAM protocols have moved from asynchronous to synchronous to pipelined to double-data-rate. Electrical interfaces have evolved from TTL to low-voltage to an alphabet soup of new-and mostly untried-signaling proposals. In the process, the fast end of the SRAM market seems to have become detached from the rest of the static memory world.
"SRAM vendors have been partnering up with their customers and building whatever kind of fast parts the customer thinks might work best for them," said Matthew Arcoleo, product manager for Cypress Semiconductor Corp. "All these new ideas aren't really converging on any one thing. It is more the opposite-they are almost diverging."
It wasn't always like that. In the traditional order of things, high-speed SRAMs were the technology leaders for the SRAM industry. New ideas appeared in the very fastest parts, and then trickled down to commodity SRAMs and the mainstream of the industry.
But in response to a variety of pressures, fast SRAMs have virtually abdicated this role. The latest high-speed SRAMs are nearly application-specific chips, intended only for a particular application. Their features may never migrate to commodity SRAMs. Yet if the features of the fastest, double-data-rate, low-voltage-swing SRAMs do spread through the industry, the result could be a major rethinking of systems architecture across a range of applications.
This new state of affairs is the result of one computer implementation issue: the problem of L2 cache design. As improving CMOS technology allowed microprocessor CPU clock rates to increase over the last few years, the need for large L2 caches grew. At the same time, the caches had to get faster-ideally, an L2 cache would cycle at the CPU clock rate. Half that speed is acceptable, but not great.
But CPU clock rates crossed 100 MHz without looking back, and are now approaching 300 MHz in workstations. To meet the needs of their workstation-and PC-customers, fast SRAM vendors had to find ways of making large, wide devices keep up with these data rates.
The L2 cache application had other peculiarities besides speed. L2 caches are for the most part burst-oriented devices. When a microprocessor reads from or writes to an L2 cache, it usually transfers an entire L1 cache line. This makes access latency less important than cycle time. In addition, most CPU board designs allow the L2 cache to be physically close to the CPU chip, and often soldered down rather than socketed. So L2 cache SRAMs live in a tightly controlled electrical environment.
These peculiar circumstances began several years ago to lead fast SRAMs away from their slower brethren. Designers struggling to find a way to keep up with CPU speed began to use techniques that depended on L2 cache's unique situation, and that would not necessarily work well in traditional SRAM applications.
The first really big departure from the traditional evolution of the industry came with Intel Corp.'s Pentium CPU. In order to achieve anything close to the CPU's potential performance, the chip had to be supported by a fast L2 cache. But for cost reasons, Intel felt that the cache had to reside on the system bus, which was synchronous.
To meet these needs, Intel-large enough to put its cart before anybody's horse-specified an SRAM that would work with the Pentium bus. The device would be burst-synchronous, large and fast, with a 9-ns access time. When Intel proposed the part, the few synchronous SRAMs that existed were very fast, relatively small and quite expensive. Intel, with promises of huge market potential, lured enough SRAM vendors to its specification to make the virtually Pentium-specific device a commodity SRAM almost from the day it began shipping.
What Intel could do with a whole market, workstation vendors could try in miniature. But workstation CPUs, with their dedicated backside L2 cache ports and very high clock rates, could be even more demanding of cache SRAMs than could Pentium. Hence, when companies such as Hewlett-Packard Co., Silicon Graphics Inc. and Sun Microsystems Inc. began working with SRAM vendors, the fast SRAM began rapidly to diverge from the conventional asynchronous SRAM.
"In workstation caches, the SRAMs are often in a point-to-point connection with the CPU, so the design of the SRAM is very dependent on the processor, rather than on a bus," said Hiroyuki Goto, engineering manager at NEC Corp. This fact has allowed CPU manufacturers to experiment with, and lately, to adopt, double-data-rate protocols that would probably be impractical on a large, widely distributed bus. And it has made it possible for each CPU vendor to choose whatever electrical signaling specification seemed right at the moment.
Introducing the alphabet
The default signaling choice for everyone is low-voltage TTL. In LVTTL, you simply design conventional TTL input and output stages, but run them at a lower supply voltage. LVTTL happens kind of automatically as you reduce the supply voltage on submicron parts from 5 V to 3.3 V to 2.5 V and below.
The surprising thing about LVTTL is its durability. A year ago, many designers were predicting that LVTTL at 3.3 V would run out of steam at about 80 MHz and never be seen again. Now, at least in point-to-point connections, designers are much more optimistic.
"It would be very complicated to put a speed limit on LVTTL," said Cypress's Arcoleo. "So much depends on the loading, the layout and so forth. But I would say based on our current experience that in point-to-point applications, LVTTL seems quite possible at 100 MHz, assuming the parts are right next to each other and the environment is well controlled. I'd say 200 MHz would be too fast for it, however."
Arcoleo pointed out that another important variable in LVTTL is the "L."
"As long as you have good control over your process's threshold voltage, you can use LVTTL structures with very low signaling levels," he said. "You just keep turning the supply voltage down, and keep the trip point at one half of VDD. This doesn't require any special termination, fancy signaling or special transistors, and it works very well in some applications."
Arcoleo said Cypress has seen "a trend toward 1.8-V LVTTL, and even some talk about 1.5 V. The latter was more for power savings than for speed, though."
Such techniques may give LVTTL a greater life than anyone expected, or than most designers would particularly want. "In general, it is hard to run LVTTL at over 100 MHz," agreed Stan Hronik, engineering supervisor at Integrated Device Technology Inc. "But if you can keep everything clean and avoid common-mode problems, you can do better. We run 150 MHz on the lab bench."
Other sources have suggested that LVTTL still has some legs at 250 MHz, although under extremely controlled conditions and in the hands of expert designers.
But 100 MHz won't cut the mustard for 300-MHz CPUs that want 2x or even 1x cache speeds. CPU designers have pushed memory designers to come up with faster signaling.
One approach has been to start with LVTTL and attempt to fix its problems, rather than starting with a clean sheet of paper. The main result of that effort has been a specification called SSTL.
"SSTL is going to be the industry standard for microprocessor caches," said Ken Yap, strategic marketing manager at Samsung Semiconductor. "It has gained so much steam lately that we think both SRAM and DRAM will adopt it for DDR interfaces. SSTL addresses the need for termination in fast systems, works in heavily loaded as well as point-to-point topologies, and simplifies most board layout issues."
Another advantage of SSTL is that it is so close to conventional LVTTL that the two are compatible. "All SSTL does is take LVTTL, provide a reference voltage and narrow the voltage swing a bit," said Hronik. "Since you are still swinging around the LVTTL trip point, you are still LVTTL compatible."
But those few changes can make a big difference. Because SSTL sends the reference voltage over the bus along with the signals, it is nearly immune to common-mode noise, such as ground bounce. "Bounce has been one of the biggest problems with SRAM modules," Hronik said. "By working around it, SSTL opens up the use of large modules to speeds over 100 MHz."
Just how far above 100 MHz is an issue under discussion. "We have heard reports of SSTL running as fast as 200 MHz," Hronik said. Samsung's Yap is even more optimistic. "SSTL can achieve hundreds of megahertz," he said. "We expect to have 300-MHz parts, achieving perhaps 500-Mbit/second data rates in DDR configurations, in the second quarter of this year."
But not everyone is so convinced about SSTL. Motorola Inc., for one, has been pushing yet another spoonful of alphabet soup-HSTL. HSTL is actually a cluster of four mutually incompatible specifications, ranging from down to 1.5-V levels, each using different termination resistors and drivers. Each category is intended for a different level of loading.
"We don't expect to see much activity in SSTL for fast SRAM," said Jim Sogas, product marketing manager for the Semiconductor and IC Division of Hitachi America Ltd. "SSTL is more defined for heavily loaded topologies. HSTL seems to be what designers are looking at for SRAMs tightly coupled to their controller."
Hitachi is bullish on the speed potential of HSTL. "By the year 2000 or thereabouts, we will be shipping DDR parts clocking at 300 to 350 MHz, in by-36 configurations," said Product Manager Ron Schwarer.
And that may not be the speed limit either. A 350-MHz cache would be a formidable design challenge for the controller and board layout engineers. But it should be electrically possible. Current micro ball-grid array packaging is capable of 700-MHz I/O, according to Schwarer. And relatively few changes would be necessary to get satisfactory impedance characteristics out of short pc-board runs. It's just a matter of waiting for CMOS processes to produce fast-enough SRAM cores and controllers.
Some SSTL fans don't have the same optimism about HSTL. "HSTL is supposed to go a lot faster than 200 MHz," IDT's Hronik said. "But a lot of technology will need upgrading before that happens. In today's environment, I think 500 MHz is not realistic. With today's boards, packages and processes, I think people are going to find that HSTL doesn't have that much of an advantage over SSTL."
And so the debate rages. In general, SRAM vendors are of necessity aligned with the beliefs of their major customers. "It's not our business to dictate what kind of SRAMs and interfaces our customers will use," said Cypress's Arcoleo. "It's our business to listen to them."
This leads a company like Cypress, whose typical customer runs SRAMs at 50 MHz, to stick close to LVTTL. And vendors who have courted major workstation clients go after SSTL or HSTL, depending on their alliances. There are also SRAM vendors focusing on GTL+ to meet Intel's SRAM needs for the Pentium Pro and Pentium II processors. But GTL+ is already looking like a legacy specification to most vendors.
One lingering question is whether all of this activity in designing cache chips for high-end microprocessors will have any trickle-down benefit for future commodity SRAMs. Will we eventually see inexpensive, huge fast synchronous SRAMs using SSTL I/O going into DSP memory applications, or sitting in cell-phone handsets?
It is perhaps too early to predict. But there are some indications, just as the glut of synchronous 32-k-x-9 Pentium SRAMs caused the parts to be so cheap that everyone started using them-even in applications needing 25-ns parts-that the new ideas may spread to the commodity market as well. Some SRAM vendors report that they are already having discussions with DSP designers about fast synchronous data memory for signal processors.
An even more intriguing possibility lies in the area of buffer memory for ASICs. Prevailing wisdom says that more and more of this memory will move on-chip as ASIC densities increase. But a supply of big, inexpensive SRAMs bursting at 300 MHz could change all that.
In many applications, such an SRAM could be used outside the ASIC with little or no loss in performance. The initial access time would be higher, to be sure. But in applications that were burst-oriented, that might not be as important as the cycle frequency, particularly with DDR.
So the new, seemingly microprocessor-specific SRAMs could after all regain their role as technology leaders for the commodity market. The future may see big blocks of SRAM migrating back outside of ASICs, and into big, enormously fast SSTL or HSTL DDR SRAMs, whose prices would have been driven down by a fiercely competitive workstation and PC market.
The old order would live on.
Copyright (c) 1998 CMP Media Inc. |