Silicon Investor (SI) -- The First Internet Community

STOCKTALK

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor. We ask that you disable ad blocking while on Silicon Investor in the best interests of our community. If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.

Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?

To: Petz who wrote (74087)	3/8/2002 5:32:52 PM
From: minnow68	Respond to of 275872

Petz, You wrote "The memory writes to TEMPIJ and TEMPKL are probably inconsequential for execution speed" Even if there is zero impact on execution speed, they very much have an impact on code density. For example, SUB EBX,EDX is going to take significantly fewer bytes of machine code than SUB EBX,TEMPIJ Also, keep in mind that even if keeping things in registers is only a little bit faster, it's still a big deal, because we are talking about removing __90%__ of the memory references. Imagine the impact if even just one percent of those would have been in memory instead of cache. Mike

To: Petz who wrote (74087)	3/8/2002 7:48:19 PM
From: pgerassi	Read Replies (3) \| Respond to of 275872

Dear Petz: Register renaming does not store temporary results in a program. Those results must be written to L1 cache, then L2 cache and finally to memory. The CPU can not make assumptions about future code not read yet. Register renaming is used only when a register will hold a result which may or may not be overwritten. Thus there may be a copy of EAX for each of the stages within the execution pipeline. If thats 6 stages then you may have 7 copies (the original plus a copy at the end of each stage) in 7 different virtual registers. The retirement stage is where the virtual register copy becomes the new original copy. A mispredicted branch could invalidate all 6 stage copies leaving the original copy intact. Having more original registers allows the compiler to substitute original registers for temporary and quickly discarded results. This helps both function calls and automatic variables. The calls are helped because more registers are available to hold parameters instead of them being on the stack with all of the attended cache costs. Some auto variables will be optimized to register status. Note: Switching between 32 bit modes and 64 bit mode does not retain the upper 32 bits of the registers. Otherwise severe penalties accrue to OO execution. This is the wisest course IMHO. This is not a problem with 64 bit code and a 64 bit OS system calls and APIs. And well written 64 bit OSes will not have a problem with 32 bit code (they assume that nothing is valid above the lower 32 bits and save anything before going down to 32 bit mode. All in all, 64 bit "long" mode will cause many compilers to generate better faster more optimal code. In some situations, 100+% improvements may be possible. Most of these will be with matrix processes. Pete