SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: pgerassi who wrote (74101)3/9/2002 3:16:39 AM
From: PetzRead Replies (2) | Respond to of 275872
 
Register renaming does not store temporary results in a program. Those results must be written to L1 cache, then L2 cache and finally to memory. The CPU can not make assumptions about future code not read yet.

I said exactly the same thing -- that the value would eventually get written to memory. However, the CPU should recognize (or, should I say, could regognize) that it need not load from the memory location because the value is still in one of the copies of the EAX register. Maybe I am wrong about that -- how would the CPU know whether the temp variable is in a DMA buffer which could invalidate the copy in the register. (A compiler would know because the variable would be declared 'volatile'.) -- unless it goes through the mechanics of doing a memory read, in which case any DMA should have invalidated the cache copies of the data and forced another memory read. And you are right that a compiler designed for 64-bit mode would be aware of more "original registers" would usually allow it to substitute a register for the TEMPxx memory variables in my example.

Register renaming is used only when a register will hold a result which may or may not be overwritten. Thus there may be a copy of EAX for each of the stages within the execution pipeline. If thats 6 stages then you may have 7 copies (the original plus a copy at the end of each stage) in 7 different virtual registers.

Can't register renaming be used over a window deeper than the length of the execution pipeline? If there are enough registers available, you might as well use them. Then the EAX register used 4 instructions ago in a store instruction would still be available for a load from the same address. (Hmm, that is a little difficult, because the CPU would have to store memory addresses for each register someplace and do a 64-bit compare to find the register containing the value.) I guess it's a lot easier for a compiler to get rid of temp variables than the CPU running RISC86 micro-ops.

The [function] calls are helped because more registers are available to hold parameters instead of them being on the stack with all of the attended cache costs.

Hadn't thought of that. Probably very common on compilers for RISC machines and should be for AMD-64 mode too. Do you know if SuSE has finished gcc yet for AMD-64?

All in all, 64 bit "long" mode will cause many compilers to generate better faster more optimal code.

You've convinced me. Elimination of temps and memory references, and lower function call overhead should balance out the one-byte preambles of the relatively few 64-bit instructions that need them. Vector and matrix processes generally use a bunch of pointers which can all be in registers. Doing an FFT on x86 is a BITC|-|; I've looked at the compiled code, it's not very efficient.

Petz



To: pgerassi who wrote (74101)3/9/2002 9:01:58 AM
From: peter_lucRespond to of 275872
 
Dear Pete,

according to c't (Andreas Stiller) "first performance considerations" point to 15% higher performance at only 5% enlarged code in the X86-64 long mode due to the higher number of registers. This, according to Andreas Stiller, could make X86-64 interesting not only for servers but for desktops as well.

See ix.de (in German).

Peter



To: pgerassi who wrote (74101)3/9/2002 6:51:57 PM
From: Joe NYCRead Replies (2) | Respond to of 275872
 
Pete,

This helps both function calls and automatic variables. The calls are helped because more registers are available to hold parameters instead of them being on the stack with all of the attended cache costs. Some auto variables will be optimized to register status.

Can this be done (using registers instead of pushing parameters on the stack)? I have my doubts about it. Isn't the result just the opposite, that is when a function is called, doesn't more registers need to be stored?

My assembly skills are extremely rusty, (as you can probably tell from the question), maybe you can refresh my memory. What happens when a function is called? Can you assume that the state of the registers is the same as prior to the call? If the answer is yes, they need to be stored someplace (on stack), so 8 extra registers presents extra overhead? Is there a shortcut for finding out which registers need to be saved?

Joe