SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: Joe NYC who wrote (62245)11/5/2001 3:03:02 PM
From: wanna_bmwRead Replies (1) | Respond to of 275872
 
Joe, Re: "How can you possibly optimize your program for the size of the trace cache?"

Simple. The trace cache is 12k uops. Applications contain structures called 'loops', which is code that runs over and over again. To optimize for the trace cache, make sure that the number of instructions in these commonly executed loops does not produce more than 12k uops. Subroutines are another code structure, and many are accessed regularly by applications. For the code writers: make sure that these routines do not exceed the number of instructions that would produce 12k uops. As long as you can minimize the number of trace cache MISSES, your application will run a lot faster, because the decoder will very rarely ever need to be used.

Re: "you have no control what's in the trace cache, since there may be other apps running in the background, the more apps, more trace cache trashing there has to be"

From what I understand, Windows task switches between applications at a rate of 50us, which can be altered by means of priority switching. 50us is 50,000ns. For a 2GHz CPU, that's 100,000 instructions from the same application before it switches. IMO, that's enough to optimize for the trace cache, and have a long period of CPU time where the decode accesses can be minimized.

wanna_bmw