Fred & Intel Investors - Intel Demonstrated a Merced Emulation at last week's Intel Developer Forum.
The "Emulator" is an x86 32-bit system programmed to logically implement all the features of the MERCED architecture, and hence, execute MERCED instructions.
A 64 Bit version of Windows NT was demonstrated, running in this emulation mode,
A clever observer may ask - Where did Intel happen to come across a 64 bit Merced-Compatible version of Windows NT?
There are two possibilities - the obvious one in probably wrong.
Paul
{=============================} eet.com
Posted: 11:45 p.m., EDT, 9/18/98
Emulator sheds early light on Merced software
By Alexander Wolfe
PALM SPRINGS, Calif. - Intel Corp. demonstrated the first Merced software ever booted up in public when it showcased a 64-bit version of Windows NT in a presentation at the Intel Developer Forum. The demo ran on a sophisticated software emulator that mimics the complete Merced instruction set, the chip's firmware-processor interface, and its multiprocessing interrupts.
"We have major operating systems booting and running on this pre-silicon development environment, which is a high-speed software emulator for [Merced] that runs on an IA-32 host," said Rumi Zahir, a senior architect at Intel. "It simulates an entire IA-64 platform; that includes multiple processors and the standard platform devices and it provides multiprocessor simulation capabilities."
Separately, Intel sources revealed several new details about some of Merced's internal features, including registers intended to enable advanced code-optimization features. New information on Katmai, Intel's advanced, 32-bit processor, also came to light.
Because samples of actual Merced silicon won't be available before mid-1999, Intel is providing its emulator to the software community in hopes of sparking the development of 64-bit software. "When Merced is ready, we can't just ship silicon. We understand that we need to have a fully functional system with all the software components," said Zahir. "Because it models more than just the CPU, this [emulator] environment enables developers to do operating-system and application porting, as well as development of firmware and BIOS code."
Though the emulator isn't publicly available, Intel has provided it under tight non-disclosure restrictions to a host of top-flight software houses. Along with the new version of Windows NT, other 64-bit OS implementations currently in the works for Merced are an IA-64 release of HP-UX from Hewlett-Packard, Unixware from SCO, Irix from Silicon Graphics, Modesto (a 64-bit version of Netware) from Novell, Solaris from Sun and Unix from Digital.
Indeed, according to Intel, some of those Unix releases are already up and running on the Merced emulator. "We have multiple operating systems and multiple apps up and booting," said Hemant Dhulla, IA-64 marketing manager at Intel. Those apps include a large, commercial database, Dhulla said.
For developers, the availability of Intel's emulator-along with related tools such as compilers, debuggers and performance-tuning software-is shedding new light on the tough task of creating IA-64 applications. Of crucial importance is how to make proper use of Merced's complex code-optimization features-notably, predication and speculation-used in Merced. Also at issue is how to proceed with debugging. Finally, there are new tricks to be learned about porting existing 32-bit applications over to the coming 64-bit world. This will be important, because many developers will initially roll over their legacy software while they race to complete full-blown, 64-bit code.
Developing software for Merced won't be the cut-and-dried affair programming for previous X86 generations used to be, because Intel's IA-64 architecture is ushering in a new kind of cooperation between on-chip hardware and the compiler. With a heritage drawn from very-long-instruction-word (VLIW) architectures, Merced will contain a large number of parallel execution units. In exchange, Merced's compiler must organize programs into instruction streams that can be simultaneously executed.
To obtain those streams, the compiler will rely heavily on the dual software techniques known as predication and speculation. The former removes unnecessary branches from a program; the latter masks memory latency by executing load instructions as early as possible.
However, few companies have the resources to create a compiler that's smart enough to implement such techniques. That's why Intel has written its own compiler back-end (the portion which performs the final, IA-64 code generation) and is providing it to key software houses. That list includes compiler vendors Metaware Inc. (Santa Cruz, Calif.) and Edinburgh Portable Compilers Ltd. (Edinburgh, Scotland). "They have language front end [for C, C++, Cobol], there's a well defined intermediate language, and then you to plug in the IA-64 code generator," Dhulla said. Microsoft is also developing compilers for Merced, Dhulla said.
One company which does have sufficient compiler smarts is Hewlett-Packard, Intel's partner in defining the IA-64 architecture. HP is developing its own compiler and has recently spearheaded the release of an academic compiler called Trimaran that's aimed at university researchers studying Merced-like architectures.
Because Merced's performance depends in large measure on how adeptly compilers apply code-optimization techniques, one of the most interesting questions kicking around software circles has been: How much better is Merced's performance when it is able to take advantage of speculation and predication?
"There's no blanket answer," said Intel's Zahir. "Workloads that fit in the processor caches might run reasonably well [without the features]. Larger workloads like databases tend to benefit substantially by using speculation or predication because [these features] can manage the memory latency much better."
Sophisticated programmers know that the features can be turned on or off by commands provided to the compiler. Indeed, this ability to switch will be at the heart of IA-64 debugging techniques. "There are numerous options in the compiler to tune your code and to provide different levels of optimization," Zahir explained.
This is important because it provides developers with a safety margin when they're venturing into uncharted IA-64 territory by allowing them to incrementally turn on more advanced features as their code passes muster. "In terms of getting the performance you want, we view it as really important that you are able to carefully select the optimizations which are applied," Zahir said.
"That is a key capability the [emulator] gives you," Zahir said. "You can do a quick-and-dirty recompile and get the program to work. Then, in a second round, you can turn on some of the IA-64 compiler switches and see what happens. So, you can focus initially on getting a functional solution and then you do performance tuning."
To aid in boosting performance further, Intel has fielded a tuning tool called Vtune. The latest version, Vtune 3.0, includes hooks to enable it to work with Merced.
Still, debugging IA-64 apps won't be a cakewalk. Compilers that use code-optimization features such as predication and speculation end up rearranging programs so much that it's difficult to map the correspondence between the final machine code and the original high-level program.
"Debugging of optimized code has been a problem all along. I don't think there's anything new here with IA-64," stressed Zahir. "The minute the compiler starts reordering operations, the original program gets changed around. So sometimes you see errors cropping up which were not there in the order that you programmed them in. That already happens today. I think a lot of debugging has to happen at lower optimization levels, until you have a level of confidence that your code is doing what you expect it to do. Then the compiler can step in and start optimizing."
To make the process more reliable, Zahir noted that a big effort is in process to validate IA-64 compilers so that they don't introduce new bugs in their own right.
Intel is also adding features on the hardware side of the Merced equation to help both hardware and software developers optimize performance. The features are a series of performance-monitoring registers, which reside on chip. Such registers have been a part of every Intel CPU since Pentium.
These include registers which keep a running count of, for example, instructions per second and cache misses. The registers can be monitored in real-time without hardware slowdown, without impacting the actual execution of the code.
However, in Merced, Intel is adding an unprecedented number of new counters. "New features that we've provided with Merced provide far greater resolution into which pieces of the code are causing performance issues," explained Zahir. "So we have an address-range check facility, which allows you to zoom in on particular pieces of the code."
Another register, Merced's "opcode matcher," enables the viewing of specific types of instructions.
For developers, the objective is to understand where the processor is spending its time, This information can then be used to identify "hot spots" in the code which could benefit from optimization.
On a broader scale, information gleaned by Intel will ultimately be factored back into future compilers and Merced optimization algorithms. "All of these are capabilities that have never been provided in any other microprocessor, to my knowledge," Zahir said. "I think they'll greatly enhance the capabilities of people doing software tuning and analysis in the field in real-time."
Developers creating 64-bit applications with help from an emulator may wonder how such code will work when running under real-world timing conditions on Merced hardware.
According to Intel, the Merced emulator is "one for one" in terms of instruction-set emulation. "It's a functional emulator that does exactly what the hardware will do," Zahir said.
Of course, once actual Merced silicon arrives, developers will have to test out their software amid real-world timing, which may produce results that are somewhat different from the simulated environment.
Even with timing-dependent items, "the amount of effort you need to put in will be an order of magnitude less, if you've made things work on the emulator," Zahir claimed. "And remember this emulator provides the entire platform interface as well. You're not just debugging your code with memory. There's actual real devices out there that you can talk to. So you can get things like interrupts with multiprocessing." |