SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Intel Corporation (INTC)
INTC 34.50+2.6%Nov 21 9:30 AM EST

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: Tenchusatsu who wrote (89716)10/7/1999 9:18:00 PM
From: Tenchusatsu  Read Replies (5) of 186894
 
MY NOTES ON THE RECENT MICROPROCESSOR FORUM (PART 1):

Be warned, there is a huge amount of information listed in this post, even though I've pared down the information. Also, please excuse me if my notes have an bias towards Intel. I am, after all, an Intel employee, so a lot of these notes will be from an Intel employee's point-of-view.

Keynote speech by Professor John Hennessy:

ú Professor Hennessy seems to be a well-respected figure even among the top-level executives and senior engineers at the conference. Almost every presentation that followed made a reference or two to Prof. Hennessy's talk.
ú "All of us hardware designers need to help out the software guys, because they need it."
ú Access to information is the killer app. Net connection is more important than any one device.
ú However, we may have oversold the Internet. (e.g. Internet refridgerator?)
ú Should we use software techniques to exploit more ILP (EPIC, IA-64), or hardware techniques? No clear-cut winners at the present. Try using combination of both?
ú Verification teams becoming huge. The growing percentage of effort spent on validation scares Hennessy.
ú Steeper slopes ahead when it comes to developing new advanced features in microprocessors. Not really a performance wall, but more like pushing a boulder up a mountain that gets steeper and steeper. What if the boulder rolls back over the development team?
ú "I think Doommarks are more important than SPECmarks"
ú We need to ask what hardware can do about RAS, even if it's mostly a software problem. Hardware can try fault containment, or even try and contain the bugs of poorly written software.
ú New challenge: Better price-performance. Integrate general-purpose with application-specific processors.
ú Processors will be driven by new dynamics: power limitations, general-purpose vs. application-specific needs, more rigid cost limitations, cycles for better user interfaces like multimedia, reliability and availability.
ú A new kind of "processor macho": how long your battery lasts, how little area the core takes up, how cheap the processor is, new applications the processor enabled, how long a system stays up.

Itanium (a.k.a. Merced) presentation by Harsh Sharangpani, Principal Engineer and IA-64 Microarchitecture Manager:

ú Cache hierarchy renamed as L1 (instruction and data), L2 (unified on-chip), L3 (off-chip). It was previously known as L0/L1/L2.
ú L3 cache size is 4 MB.
ú No register renaming, reservation stations, complex dependency checking in hardware, etc.
ú Can issue up to six IA-64 instructions in one clock cycle.
ú Not completely static scheduling (common misconception of IA-64). "Right level of smarts." For example, register scoreboard is used to allow some out-of-order execution.
ú 10-stage pipeline.
ú 4 integer/MMX units, 2 FMACs (floating-point multiply-and-accumulate units), 2 SP FMACs for SSE, 2 load/store units.
ú Can evaluate up to 3 branches per clock, optimized for clusters of branches.
ú 4 extended-precision or double-precision FLOPs/cycle, 8 single-precision FLOPs/cycle (SIMD). "Great for security and for 3D graphics."
ú Front-side bus uses special ECC encoding for consecutive 4-bit errors.
ú Enhanced machine-check architecture: Continue on some errors, recover on some, contain on others. Poisoning data allows for graceful recovery.
ú Seamless IA-32 compatibility in hardware.
ú Production in mid-2000.

Power4 Dual-CPU chip presentation, Jim Kahle, IBM:

ú "It's the memory, stupid!"
ú Two processor cores on one chip. Each core is based on Power3 architecture (I think), and each core can run over 1 GHz.
ú Processor cores share very high speed chip-chip communication. L2 cache is also shared, as well as L3 cache. (L3 cache tags are on chip; data is off-chip)
ú Multi-chip Module: Four chips on one 4.5" x 4.5" module, interconnected by network of unidirectional ports for high bandwidth. Effectively 8-way multiprocessing on a module.
ú A single chip has 2200 signal I/Os and 5500 total I/Os. (An Intel guy commented that with the expensive packaging required for that module, the silicon basically comes for free.)
ú Process on 0.18u (gate length 0.12u), 7 layer, 170 million transistors. Probably uses copper interconnect as well.

Alpha EV8 with Simultanous Multithreading, Joel Emer, Compaq:

ú 1.2 to 2.0 GHz on 0.125u CMOS. SOI-compatible, copper interconnect, low-K dielectrics, 250 million transistors, 1100 signal pins in flip-chip packaging.
ú 8-wide superscalar, large on-chip L2 cache, integrated RDRAM controller, directory-based memory system, ccNUMA.
ú 4-way simultanous multithreading (SMT): Basic idea is to utilize unused execution slots in processor for other threads.
ú SMT allows performance improvement of over 100% on SpecInt and 50% on SpecFP, compared to Alpha 21264. Transaction processing (i.e. server apps) see more than 100% improvement.
ú SMT needs to be introduced to software developers.
ú SMT makes processor more latency-tolerant.

SPARC64 V, Michael Shebanow, HAL Computer Systems:

ú Systems built by Fujitsu, late 2001, tapeout 3Q00.
ú 1 GHz clock speed with 9-stage pipeline.
ú Not much to report here; it's just a refinement on traditional RISC architecture.
ú "Predication bets against branch predictor, which is pretty good."
ú Power target 100 watts, temperature target 85 degrees Celcius.
ú Performance estimates: 70+ SpecInt, 130+ SpecFP, 4 GFLOPs.
ú Die size 380 mm2 on 0.12u process, 65 million transistors.

AMD Athlon processor: Future Directions, Fred Weber:

ú Current status: Production wafers for 0.18u already going through.
ú Athlon 700 gets 32 SpecInt, can get 35-37 SpecInt w/ 266 MHz bus and projected compiler and prefetch optimizations. SpecFP is currently at 23, can get 34-41 SpecFP w/ above enhancements.
ú Plan in 2000: Add 1 MB and 2 MB full-speed, 16-way set-associative L2 cache. 266 MHz front-side bus. 2-way MP chipset from AMD w/ DDR SDRAM. Development of API and HotRail chipsets underway.
ú LDT (Lightning Data Transport): Point-to-point, cache-coherent link for scalable multiprocessing. It can be either 8, 16, or 32-bits in each direction (two unidirectional ports, as opposed to SP's simultaneous bidirectional signaling). Each pin is up to 1.6 Gbits/sec. 16/16-bit link will provide 3.2 GB/sec bandwidth in each direction, for a total of 6.4 GB/sec bandwidth.
ú AMD to add 64-bit extensions to x86 called x86-64. Purpose is to enable large memory OS and applications. 5% die area cost. Allows migration from 32-bit to 64-bit to be seamless and at the user's pace. AMD sees an opportunity opening up since Intel is going with brand new instruction set for IA-64. (It seems like x86-64 is to IA-32 what the 386 32-bit instructions were to the 286.)
ú "Instruction Set is one of the Weaker Tools [for improving architectural performance]." (Anti-Merced FUD?)
ú AMD plans to deploy multiple x86-64 processors on a single die. (Heh, good luck trying to manufacture that!)
ú To work around limiting x87 FPU register stack, AMD will introduce technical floating point instructions (TFP) with x86-64. It will include direct access to a large register file, and it should close the SpecFP performance gap with the RISC competition.
ú 64-bit solutions like IA-64 maintain compatibility through emulation, either through software or hardware. Any x86 application is relegated to 2nd class status. "AMD will not relegate x86 to 2nd class status. Compatibility is key."

Overall, this first half of day one was very interesting. There were many methods introduced which attack the performance problem of high-end systems.

ú Intel is going to look for more instruction-level parallelism (ILP).
ú IBM is going with dual cores on a chip and multiple chips on a module to exploit symmetric multiprocessing (SMP).
ú Compaq is going with symmetric multithreading to exploit thread-level parallelism (TLP).
ú Sun is refining their RISC implementation even more.
ú AMD is going to extend the existing x86 instruction set even more.

There were also two panel discussions before lunch. The first one brought back all of the presenters for a Q&A session on "Approaches to High-Performance Processor Design":

ú In general, each guy defended his solution as the correct method. There were also a few jabs thrown against IA-64, as expected from guys who stand to lose the most from it. ;-)
ú Harsh of Intel said that nothing is preventing us from implementing symmetric multithreading in IA-64, or multiple cores on a chip, etc. But first, Intel wants to provide a scalable building block.
ú Fred Weber of AMD said that most of the improvements in RISC can be applied back to x86. He also feels that IA-64 EPIC doesn't provide much of a boost over RISC.

The second panel discussion talked about "Server System Architecture Trends" and had representatives from HP, HotRail, Intel (Justin Rattner, Intel Fellow), Sun, Compaq, and IBM.

ú HotRail expands beyond limits of shared MP bus with its switched-fabric topology. (Architecture will use LDT, which is being co-developed with AMD.)
ú HP will do both shared bus (still viable, especially in the low and mid-end) and switched-fabric architecture.
ú Merging of NGIO and Future I/O into "System I/O" (SIO) was better for the consumer. The theory of the merged SIO spec may be idealistic. Practice has yet to be realized because it has only been four weeks since the conception of SIO.
ú IBM owns POWER architecture, so it owns its own server platform. IBM said this is better for customers who demand reliability, flexibility, management like hot-plug, and robustness.
ú Justin Rattner said there will be a move toward much smaller servers and dependency on applications for RAS. Server will become the field-replaceable unit. IBM countered by saying that multi-billion dollar corporations like banks won't use little disposable servers. Sun said there is no single answer for the Internet.
ú Compaq said customers are looking for more than performance. They want features, and proprietary systems enable these features more easily than commodity systems.
ú Steve MacKay of Sun said that software lags behind hardware features like NUMA. He hopes that the programming model will move toward SMP as much as possible.
ú Justin said that DDR SDRAM is the next memory technology for servers, but pin limitations drive us to Rambus. He's hoping that if Rambus can dominate desktops and non-PC markets, it will later dominate servers.
ú HotRail once again emphasized its switched-fabric topology over shared-bus. (It's an artificial distinction, in my opinion. At least one guy I met there agreed with me.)

Tenchusatsu
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext