<font color=purple>[PC_Support] How AMD and its partners are putting x86 back on the right track ... Bryan J. Smith pc_support@lists.leap-cf.org Sat May 25 03:02:15 2002
Previous message: [PC_Support] Re: How do I make my computer faster? -- Personal example ... Next message: [PC_Support] How do I make my computer faster? Software approaches
--------------------------------------------------------------------------------
When is Intel "IA-32" (aka Intel "x86") complex instruction set computation (CISC) going to finally die? That question has been asked every since MIPS R2000 processors hit the market in the mid-80s. While the debate of "kill x86" v. "x86 forever" rages on, the company who is behind the latter might actually be the best one for killing the former as we will see.
Overview: - IA-64: When Reality Breaks Theory - Athlon: The Re-programmable Pentium - AMD x86-64 and Intel Yamhill - Digital FX Flashbacks - Transmetting the Future
- IA-64: WHEN REALITY BREAKS THEORY
The 60s introduced complex reduction set computation (CISC) which was quickly followed by the birth of the microprocessor by Intel, which all its subsequent products would be based. And CISC moved into superscalar and pipelined design, it was obviously difficult to optimize. So the 80s brought us reduced instruction set computation (RISC) drastically reducing logic size and design times and many CISC vendors made their switch then and there. Unfortunately, RISC still didn't solve the issue with only 50% of pipelines being utilized at any time. So when Intel skipped the RISC generation but finally decided to move away from its CISC backbone, they moved to address the shortcomings of RISC with a approach for the 21st century known as explicitly parallel instruction-set computation (EPIC).
EPIC is extremely innovative. It uses heavy compile-time optimizations to assemble traditionally 32-bit RISC instruction words into a 128-bit very long instruction word (VLIW) of three 41-bit RISC words and some control bits. This eliminates a lot of overhead in the run-time design of the processor, making RISC even more RISC. And in an effort to completely eliminate the dreadful even of a processor stall caused by branch misprediction, it introduced the concept of branch "predication" where both branches are executed and the road not taken result is discarded when the branch is resolved. Unfortunately, EPIC wasn't as good in silicon as they thought it would be.
The first Intel IA-64 processor, Itanium, wasn't just a flop because it did not run older IA-32 CISC code well. It failed to really keep its pipelines 90% full like the EPIC approach promised -- despite heavy compiler optimization development. And when it came to branch predication, the savings in "stalls" was not worth the extra, useless work the processor committed itself to doing by executing the branch that would not be taken. While Intel is addressing the utilization issue with the addition of traditional run-time optimization, and even some traditional branch prediction in its 2nd generation IA-64 processor, "McKinley," even Intel itself is wondering if they have made the right approach to transitioning away from CISC IA-32.
- ATHLON: THE RE-PROGRAMMABLE PENTIUM
You've never heard an Intel engineer curse more than when they speak of Math Matrix eXtentions (MMX) or Streaming SIMD Extensions (SSE). Intel has not only bloated its CISC IA-32 instruction set with such concoctions, but have ended up giving their engineering teams all kinds of tangent designs to figure out how to slap onto their cores. Instead of evolving their now aged Pentium core design with more general arithmetic logic unit (ALU) and floating point unit (FPU) pipes and registers, they slap on more "lossy, application-specific" integer-float interpolating logic and registers for them. Worse yet was the fact that they still haven't addressed their "less-than-ideal" out-of-order and branch prediction units because the whole Pentium series was supposed to be addressed by IA-64 EPIC/predication by now.
The result is a chip that excels at specific, visual applications where accuracy is not necessary, but one that is not so fast at general applications let alone engineering and scientific ones.
While the well-funded Moore and co. design teams were busy either adding accessories to their Mustang or their prototype that only millionaires could afford, his former Fairchild colleague Sanders was off spending the few R&D dollars they had to build a Viper. They took the aged muscle car approach that they knew worked, refined and modernized it with more pipes, better branch prediction, lots of buffering into a solid, efficient, 9-issue core in a few years instead of a decade. Not the most efficient, easily double the size of an original RISC design, but it was built to run code written for a 4 decade old approach. The result would be known to end-users as the Athlon. A core design that will serve them a good 5 years before it needed to be overhauled.
AMD has always led Intel in ALU performance and memory loads, and their branch prediction unit was based on lessons learned in the K6 (which was overkill). But the Athlon's greatest strength was its 3-issue FPU which causes Intel headaches to this day. Whenever Intel adds another 50+ opcodes for some fancy, schmancy multimedia niche, AMD just writes some microcode to leverage its FPU (or ALU in some case) to do it. So while Intel has to slap on yet another execution unit and more registers, AMD just figures what FPU pipes to use and registers to dedicate to it. The effort is far less, and more time can be spent to optimizing the accommodation in existing design, instead of rushing to finish the "slap on" design, do timing resolution of the new logic with the old, etc...
Although IA-64 also uses microcode to execute the bloated CISC IA-32 instruction set on its EPIC design, it wasn't designed for it like Athlon. Seeing the Intel IA-32 team add more and more junk to its product without giving the IA-64 team a thought reminds me a lot of another company, who's "Chicago" team did the same with their products without consulting the other guys in their same company.
- AMD x86-64 AND INTEL YAMHILL
The Athlon also did one more thing for AMD, it gave them their own hardware platform. No longer did AMD need to wait on Intel to move on the OEM end, they moved the platform themselves. Sure, the first 6 months were dominated with few products, poor 3rd party support, and even poor, end-product reliability, but the platform boomed in no time, and by the end of the first year, few OEMs were limiting themselves to Intel. Now AMD is going to finish the job.
AMD x86-64 brings 64-bit addressing to IA-32, in a fully, backward compatible, similarly performing way. In fact, x86-64 is nothing special, it's just an Athlon with 64-bit addressing, another pipeline and more registers now with 64-bit lengths. Nothing major to address in overall design, other than adding in the addressing/register extensions and making sure it handles run-time resolution of switching between legacy and 64-bit modes.
Since IA-64 "McKinley" won't arrive until x86-64 does as well, Intel realized they had far too many of their eggs in one basket. Although Intel has not confirmed it, their "Yamhill" project is one to build an x86-64 compatible processor. This means that Intel had to license AMD x86-64 which AMD has confirmed. This means engineering bliss for the future of IA-32. Why?
AMD has a history of not bloating IA-32. Only once have they introduced instruction set extensions (3DNow!), and those were done to address the _shortcomings_ of a marketing-driven extension set from Intel (MMX). Later refinements of those extensions were often just adoptions of Intel introductions and, as discussed before, done in a way where microcode was added using the existing ALU/FPU pipes. Now that AMD controls the ISA as well as its own platform, IA-32 will finally "stabilize" under AMD's x86-64 leadership. Even Intel marketing will take a "back seat" for awhile as they cannot even hope to have an x86-64 competitor out until late 2003 -- a good year behind AMD.
- DIGITAL FX FLASHBACKS
AMD doesn't have the R&D dollars of Intel. Even though they spend a greater percentage on R&D than Intel (who spends a lot of that on marketing-related R&D projects), they cannot make a dent in comparison. So they rely on industry partnerships who contribute and proliferate their combined concepts, innovations, ideas and products into a community designed platform. No more apparent is this than in the introduction of their ultra-flexible HyperTransport interconnect, which is being used by basically everyone outside of Intel, even for Intel platform systems in some cases.
At the forefront of this are employees of the former company known as Digital, now owned by Compaq, now owned by HP. These employees built the most anal of RISC designs, the Alpha microprocessor (uP) and the most practical of microcontroller designs (uC), the StrongARM. They dominated the design of pretty much all of the enterprise-level system and bus logic other interconnects, EV6/7, PCI bridges, etc... And they seeded much of the commodity Ethernet market with their popular design, the Tulip. Although that collective engineering resource is gone, their footprint on history even continues today at AMD and partners like API Networks (fka Alpha Processor, Inc.). And one major technology they introduced continues to be undervalued.
When Digital created the Alpha, they created an ultra-clean 64-bit platform for _only_ 32/64-bit computing -- no 8 or 16-bit. This wasn't by mistake, nor was it just to show how efficient RISC could be when taken to a level an "analness" like the Alpha. It was a hardware conduit for an innovative software concept and associated set of tool. Those tools was FX!32, which silently won award after award for its approach.
FX!32 was a "binary compiler" (if I may call it that) that not only run-time emulated software written for another architecture or "byte code," but did run-time _conversion_ of binary executables and libraries from one architecture into Alpha. It then further did post-conversion optimizations on the new Alpha binaries each time it was run -- to try to further match the execution speed of the original -- and boy did it come close! It was a brilliant piece of work -- one that Digital needed not just to sell the NT/Alpha platform as it could run NT/x86 binaries but, more importantly, to allow users to run VAX/VMS binaries on the accompaning Alpha/VMS platform. Digital would even go as far as to introduce FX!32 software for Linux/x86 -> Linux/Alpha and even some limited UNIX/MIPS -> Alpha/UNIX.
Digital realized that software runs on the operating system platform, not just an architecture. While it is common to emulate other software platforms via library calls or even semi-virtualized hardware on the same architecture or "byte code" (e.g., VMWare or WINE on x86), Digital found it was far easier to emulate various other architecture (MIPS, VAX, x86) on the same software platform (UNIX, VMS, Windows/Linux, respectively) to theirs (Alpha). And it didn't stop there because they could _permanently_convert_ the binaries of those other architectures to Alpha. Because binaries are built for a software platform -- the architecture was just an instance of it.
The Digital Alpha technology was licensed to Samsung, AMD and Intel, with Intel being the owner of the platform now. One has to wonder if Intel knew today how IA-64 would perform, would they had not bought Alpha a long time ago and used it as its nexgen, non-CISC platform? Alpha has _always_ been the highest performing architecture. I mean, while an 800MHz, 0.18um Itanium toasts even a 2.4GHz, 0.13um Pentium 4 at floating point, even 3-year old, 600MHz 0.35um Alpha 264s _outperforms_ that same Itanium by an even wider margin! You add in the fact that FX!32 on Alpha _greatly_outperforms_ Itanium when it comes to running x86 binaries, and one can only wonder if we wouldn't have 64-bit Intel Alpha chips now, running at 4GHz at 0.13um, with fully supported FX!32 software for running legacy Windows and Linux binaries on it. And instead of talking about "fixing" IA-64 with "McKinley," we'd be talking about the new Alpha 364 design that is the best of both worlds -- adopting Intel EPIC ideas like compile-time optimization to improve RISC run-time utilization.
- TRANSMETTING THE FUTURE
So what's my point? The main reason we have NOT seen something like FX!32 is because Intel keeps extended IA-32 and toying with IA-64. Yeah, so, Intel finally owns Alpha now, and while McKinley and later IA-64s will benefit, it's far too little, far too late. Now that AMD is commanding IA-32 c/o their 64-bit x86-64 -- maybe, just maybe, the AMD-API guys are thinking about going beyond legacy CISC IA-32. Maybe they are thinking of building their own 128-bit VLIW design. Or doesn't someone else already have one???
Yes, one company does. In fact, they looked at it a little differently. Instead of writing some add-on systems software that lets one architecture run the software written for the same platform as another, this company put it in the firmware and that's all it does! It doesn't even market its own, natively running software but _always_ runs the foreign bytecode. The Transmeta Crusoe architecture is a 128-bit VLIW RISC design that has virtually _no_ microcode at the core, but uses a software/firmware-driven principle know as "code morphing" to take another bytecode and break it down into its raw, native VLIW words at run-time. "Code morphing" is yet another innovative approach based on the simple fact that x86 bytecode rules the landscape which, like FX!32, is based on the fact that it is easier the same software platform on a different architecture than a different software platform on the same architecture. Furthermore, why else do you think they hired the guy who wrote the first operating system against the full Intel i386+MMU specification, Linus Torvalds -- because he knew x86 bytecode in and out! And guess who is also a licensee of the Transmeta IP?
Yeah, the same company who is now in control of IA-32, AMD. Makes you wonder where this is all leading. Let me piece it together my predictions for you ...
- As the new leader in x86-64, AMD will "permanently stabilize" IA-32 ISA. x86 bytecode will now be a "standard" that doesn't change.
- A new, 2nd generation 128-bit VLIW using HyperTransport will be born out of the AMD-API-Transmeta alliance. This chip, unlike Crusoe, will have native versions of 64-bit Windows and Linux released for it.
- A merger of FX!32 and Code Morphing concepts will lead to an improved "binary complier" for both Windows and Linux. You will still have to run Windows/VLIW2 to run Windows/x86[-64] and Linux/VLIW2 to run Linux/x86[-64] binaries, respectively, but it will finally move people away from IA-32/x86 by 2006-2007.
IMHO, if this happens, Intel will have its issues go exponential. Not only will they have a tough time proving to people that IA-64 is viable versus this new VLIW2, but their other strategy revolves around the, "now dying," x86-64 ISA. Since IA-64 hasn't "caught on" yet and there is a very good chance that even 2nd gen "McKinley" won't either (the consumer version isn't due until late 2003), the only chance Intel has is to go x86-64 "full bore" and keep people from moving off it. So we're back to Intel actually being the "x86 forever" guys!
I could be wrong about AMD looking at VLIW. But something tells me all those former Alpha engineers are watering over the Transmeta technology -- or at least when thinking about making improvements to what they have already done. If this new "binary compiler" becomes available, x86 may very well die regardless of Linux adoption. In fact, Linux desktop adoption helps Intel with IA-64, so maybe continued Windows/x86 usage is in Transmeta-AMD's favor? So maybe the support of Microsoft by AMD is not so blind, eh?
It's just hard to tell. But it'ss harder to sit by and see good ideas and innovations that could easily move us away from x86 inefficiency to a new, RISC-like, VLIW bytecode platform not happen in the next 5 years. Because if it is going to happening, their is more change of it from the AMD-Transmeta partnership than from Intel and its IA-64 IMHO. Ironic this is because it is AMD who is keeping x86 alive with x86-64 because their "seizing control" of it is our best chance of stabilizing and getting off it. Because like Microsoft with its Win/NT-ignorant Win/DOS market, Intel cannot keep their IA-32 marketeers from ruining any chance IA-64 has.
-- Bryan
-- The US government could be 100x more effective, and 1/100th the Constitutional worry, if it dictated its policy to Microsoft as THE MAJOR CUSTOMER it is, and not THE REGULATOR it fails to be. --------------------------------------------------------------- Bryan J. Smith, SmithConcepts, Inc. mailto:b.j.smith@ieee.org Engineers and IT Professionals smithconcepts.com
matrixlist.com |