SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: graphicsguru who wrote (239544)8/29/2007 3:34:54 PM
From: fastpathguruRead Replies (2) | Respond to of 275872
 
Remember that processor state is extremely subtle,
given that you're also doing out-of-order and superscalar
and speculative execution after branch prediction.
This sort of mechanism interacts with all of those.
And I doubt that you can set things up to easily snapshot
the whole processor state without a huge impact on
critical paths.


You're over-engineering it. Saving the state of a thread so that it can be restarted (i.e. a context switch) is known science. And this is a highly-optimizable special case of a context switch.

It would be silly to save the entire state of the core, esp. when the processor is sitting around doing nothing but retiring in-flight instructions and waiting for the read to fill anyways. Once the read is a miss, let the core drain any in-flight instructions, and then save state "simply", just like a context switch. (But save it on chip, not to memory. Reserve a cache line or something.)

If the choice is between

A) stalling for 200 cycles without worrying about rewinding/restarting,

or

B) draining for 40 cycles, saving state for 40 cycles, and stalling for just 50 more cycles, knowing I can rewind/restart by flushing and overwriting register state from a cache line on the rare chance it's necessary,

you choose B.

You could even turn it on and off, with a branch prediction-like mechanism.

The beauty of it is that this would only take up cycles that would otherwise be left idle.

fpg