SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : ADI: The SHARCs are circling!
ADI 237.63-1.6%3:59 PM EST

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: BostonView who wrote (1120)10/14/1998 10:18:00 PM
From: Danny Hayden  Read Replies (1) of 2882
 
Home
Headlines
Careers
Columns
IP Watch

Microprocessor Forum: DSP architectures
vie for telecom slots

By Stephan Ohr
EE Times
(10/14/98, 2:02 p.m. EDT)

SAN JOSE, Calif. — Two new DSP architectures show entirely different
approaches to the problems of power and programmability.

The StarCore 400 architecture, jointly developed by Motorola Inc.'s
Semiconductor Products Sector (Austin, Texas) and Lucent Technologies
(Murray Hill, N.J.), is a compiler-driven design intended to extract maximum
performance from programs written in C code. It features a scalable
computational model and instruction set.

The new TigerSharc superscalar architecture from Analog Devices Inc.,
which claims high-performance for programs generated in C or other
high-level languages, is built on the rapid-fire response of short instructions
piped in by a 12-Gbyte/second bus bandwidth. TigerSharc promises to
execute up to 2 billion multiply-accumulate (MAC) operations per second.

Both architectures, unveiled at the 11th annual Microprocessor Forum, will
compete with Texas Instruments Inc.'s C6X devices for design wins in
telecommunications line cards and cellular basestations.

The StarCore architecture will compete head-to-head with TI's C6X
architecture in offering the parallelism that telecom switching systems seem
to want. It also pursues the C6X goal of eliminating assembly language
programming. Rather than using a very long instruction word (VLIW)
approach, StarCore uses a variable length execution sets (VLES) which can
expand or shrink with each hardware implementation of StarCore. "This is
'post-VLIW,' " said Kevin Kloker, architecture director of StarCore and
deputy director of Motorola's StarCore design center. "This is VLIW done
right."

Indeed, the StarCore 400 qualifies as one of the industry's first
"compiler-driven" DSP designs, in which the C-code compiler and the
hardware are precisely tuned to each other. The variable length execution
sets (VLES) are actually groupings of basic instructions intended for specific
execution units. In the operation of the compiler, the C code is scanned with
reference to a specific StarCore implementation, and basic instructions are
grouped together and scheduled according to the "discovered parallelism."

Instruction parallelism is scaled by the compiler. "The object is to achieve
multiple things with each clock cycle," Kloker said, "but you don't want 'no
ops' or alignment issues." The StarCore uses a basic orthogonal 16-bit
instruction — it's "compiler friendly," Kloker said — and will avoid alignment
issues in the instruction pipeline with "prefix." This results in a compiled code
density that is comparable to the best embedded RISC processor, Kloker
said.

Since StarCore is intended to be a scalable architecture, actual hardware
implementations can vary in the amount of parallelism they embody. Some
implementations can have two MACs, like Lucent's DSP16000, for example,
while others may have more.

TI's C6X has eight parallel execution units; its compiler is always looking for
instructions to parallelize. "The problem with this approach is that it's useful
for only those applications which are not power-sensitive," said Kloker. "It is
not designed for scalability." With its compiler-driven approach, the StarCore
400 "is one of the first to do a practical job of scalability," he said.

In all cases, the computational resources will include data ALUs and
registers, address ALUs and address registers, and instruction registers and
instruction set accelerators. Since everything running through the machine
stems from memory accesses, data bandwidth and instruction bandwidth will
be the most important factors governing performance. The video and
multimedia capabilities demanded by MPEG-4 and third-generation wireless
phones would demand billions of MAC operations per second, the StarCore
team acknowldged. The StarCore 440, a version of the architecture due in
the second half of 1999, is predicted to deliver 1.2 billion DSP MACs per
second, with 4 MACs per tick of a 300-MHz clock. With a 128-bit VLES
instruction grouping (and two instructions used for MACs) the machine will
actually execute 6 instructions per clock — 3,000 RISC Mips. The data word
is 16-bits wide, with a 32-bit address word and 40-bit accumulators. The
machine takes in 8 data words per clock, or 4.8 Gbytes per second.
Implemented in 0.13-micron CMOS, the implementation is expected to
consume less than 0.1 mA per DSP MAC at 1.5 V.

Also on Wednesday (Oct. 14), Analog Devices unveiled its TigerSharc
architecture, which it said will perform 2 billion 16-bit MAC operations per
second — theoretically, 8 MACs per tick — with a 250-MHz clock.

ADI's TigerSharc device actually has two computational units, each capable
of a 32 x 32 multiply, and each computational fed by a 128-bit wide data bus.
Three 128-bit buses actually shuttle across this DSP (two for data, one for
instructions), making for an aggregate bandwidth of 12 Gbytes/s. Each
second, the machine cranks through the equivalent of two billion 16-bit
MACs, or 500 million 32-bit MACs, or 8 billion 8-bit operations.

"That's the beauty of this architecture," said Gerry McGuire, 32-bit DSP
product line manager for ADI (Norwood, Mass.). The TigerSharc is
impervious to data types, he said. It will accept 8-, 16- or 32-bit data, and
adapt accordingly on the fly. The execution unit will scale automatically,
instruction-by-instruction, said McGuire.

"Mixing and matching the data types allows the architecture to be tuned to
the precision of the task at hand," he said. Cellular basestation applications
require high bandwidth to support new media types; remote access servers
must support multiple channels on one chip to carry more subscribers at
lower costs; and new air interface standards, vocoders and modems will
demand programmability. Modulation and demodulation use 16-bit data types,
as does voice coding. But filtering, echo cancellation, and line equalization
can use either 16- or 32-bit data types. Forward error correction such as
Viterbi detection can use either 8- or 16-bit data types, but new convolution
codes use 32 bits. Thus, it's important for a processor to handle all of these.

While TigerSharc embodies a wide array of parallel resources, its
programming model is closer to a short-pipeline RISC machine — an
architecture Analog Devices calls "static superscalar." There are no
ambiguities for the compiler, said McGuire. The machine has a very
"deterministic execution flow," McGuire said. Everything is accomplished on
the instruction level. There are 128 general purpose registers, and a
programmer decides how these are used, he said. There is a two-cycle delay
for every instruction, i.e., it takes two cycles for the results of a
computational instruction to appear in the output register. This makes use of
TigerSharc relatively easy for an assembly language programmer, McGuire
said.

ADI offers plenty of high-level language support for programmers. With a
32-bit architecture, for example, it offers orthogonal addressing, no special
hardware modes and user-determined branch prediction. Nevertheless, the
TigerSharc's organization is intended to make things easy for the assembly
language programmer. "Sometimes you gotta get your hands dirty," McGuire
said about the need for assembly language programming. While futurists
insist that DSP programming will depend more and more on C or C++, the
highest DSP performance achieved to date has depended on program
tweaking and tuning in assembly.

Samples of TigerSharc, produced in a 0.25-micron process, will be available
in 1999, and CMOS scaling will allow the device to be clocked at higher
speeds, according to ADI. The architecture will produce over 5 billion 16-bit
MAC operations per second from a 0.1-micron process running at 600 MHz,
McGuire said.

In an effort to keep the Microprocessor Forum announcements in
perspective, Texas Instruments called editors in advance of the forum to say
that its C6X DSP was the only high-performance architecture currently
shipping, and the only one with significant design wins. "This is an
architecture that's good for C," said Henry Wiechman, product marketing
manager for TI (Dallas). "We've continued to improve the compiler."
Efficiencies of 90 percent , and sometimes 100 percent, can be obtained for
the C6201, Wiechman said. Moreover, tooling is being developed which will
help improve the code size for memory-constrained applications.

Based on the sale of linkers, machinery that connects code-development
platforms to target hardware, TI said over 1,000 designs are currently in
progress using the C6X architecture, and new designs are being added at a
rate of 10 per day. The Code Compiler Studio, introduced at DSP World last
month, will aid C6X programmers with increased efficiency and with rapid
time to market, Wiechman said.
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext