Joe, Re: "Which one occupies the largest real estate? If L2 and decode run at 1/2 speed, plus trace cache which does, they may constitute the majority of real estate on the chip."
Intel says it's the nominal clock, so I am comfortable with that. The only question is what defines "majority", since 51% and 99% both qualify.
The more I think about it, the caches probably do operate at one half the nominal clock. As Ali suggested, they are probably interleaved to give the same bandwidth as a cache of double the speed. Intel lists the L1 bandwidth at 48GB/s for a 1.5GHz Pentium 4. Assuming a 750MHz clock and 32-byte data path, the only way to achieve 48GB/s is through interleaving by 2-way. Since the latency of the L1 cache is listed as 2-cycles, this agrees (but does not confirm) that assumption.
As for other units using the nominal clock, I am sure that every pipeline buffer must be running off of this (with the exception of the execute stage of the "fast" ALUs). The queues in the out-of-order unit are probably running off of the fast clock. I'm sure the dispatch unit does, as well. My opinion on the decoder is that it, too, runs under the nominal clock, but I'm sure we'll eventually find a code base to adequately test it - we just aren't there, yet.
wbmw |