To: pgerassi who wrote (74101 ) 3/9/2002 3:16:39 AM From: Petz Read Replies (2) | Respond to of 275872 Register renaming does not store temporary results in a program. Those results must be written to L1 cache, then L2 cache and finally to memory. The CPU can not make assumptions about future code not read yet. I said exactly the same thing -- that the value would eventually get written to memory. However, the CPU should recognize (or, should I say, could regognize) that it need not load from the memory location because the value is still in one of the copies of the EAX register. Maybe I am wrong about that -- how would the CPU know whether the temp variable is in a DMA buffer which could invalidate the copy in the register. (A compiler would know because the variable would be declared 'volatile'.) -- unless it goes through the mechanics of doing a memory read, in which case any DMA should have invalidated the cache copies of the data and forced another memory read. And you are right that a compiler designed for 64-bit mode would be aware of more "original registers" would usually allow it to substitute a register for the TEMPxx memory variables in my example. Register renaming is used only when a register will hold a result which may or may not be overwritten. Thus there may be a copy of EAX for each of the stages within the execution pipeline. If thats 6 stages then you may have 7 copies (the original plus a copy at the end of each stage) in 7 different virtual registers. Can't register renaming be used over a window deeper than the length of the execution pipeline? If there are enough registers available, you might as well use them. Then the EAX register used 4 instructions ago in a store instruction would still be available for a load from the same address. (Hmm, that is a little difficult, because the CPU would have to store memory addresses for each register someplace and do a 64-bit compare to find the register containing the value.) I guess it's a lot easier for a compiler to get rid of temp variables than the CPU running RISC86 micro-ops.The [function] calls are helped because more registers are available to hold parameters instead of them being on the stack with all of the attended cache costs. Hadn't thought of that. Probably very common on compilers for RISC machines and should be for AMD-64 mode too. Do you know if SuSE has finished gcc yet for AMD-64?All in all, 64 bit "long" mode will cause many compilers to generate better faster more optimal code. You've convinced me. Elimination of temps and memory references, and lower function call overhead should balance out the one-byte preambles of the relatively few 64-bit instructions that need them. Vector and matrix processes generally use a bunch of pointers which can all be in registers. Doing an FFT on x86 is a BITC|-|; I've looked at the compiled code, it's not very efficient. Petz