SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: Charles Gryba who wrote (71869)2/18/2002 3:59:26 PM
From: ElmerRead Replies (2) | Respond to of 275872
 
I think at some point the P4 will have enough cache that it will make up for the lousy design.

Everyone else should have such a bad design. It's just the fastest integer processor in the universe and the only thing in it's league is the power4 design from IBM. They need SOI, over 1.5MBs of on die cache and 128 Meg of L3 to tie the P4 in SPECint and that's without HT running on the P4. Your description of a lousy design is absurd by definition but not unexpected.

EP



To: Charles Gryba who wrote (71869)2/18/2002 7:52:17 PM
From: Gopher BrokeRespond to of 275872
 
That it will be fed quickly enough that it won't matter that it has to deal with bubble latencies.

I don't understand your reasoning. I am not a hardware person, but my understanding is that if, for example, a 10 GHz processor spends 50% of its time waiting for bubbles to clear then it is, at best, the equivalent of a bubble-free 5 GHz processor.

The cache is all about the processor not having to wait for FSB memory access, not pipeline bubbles. So, giving it more cache will certainly bring it up from below that 50% limit, but without doing anything about the bubbles it hits a wall at 50%.

To handle bubbles requires either better branch prediction (or early mispredict detection I guess) or multiple parallel instruction pipes.

I guess that instruction pipes can only be run in parallel to the point where the instructions might interact (memory accesses, for example) which is why Intel took the approach in P4 of decoding x86 instructions into primitive micro-ops in parallel pipes and then feeding those micro-ops into the one main processing pipe? Problem is that only addresses a small fraction of the large P4 pipeline.

Bottom line is that the P4 certainly needs a bigger L2 cache than Athlon/Hammer because the long pipe amplifies the FSB memory stall impact (actually it needs both a bigger cache and better pre-fetch algorithms, I guess). But when there is enough L2 to make FSB accesses insignificant then bubbles become more of a problem, not less.

But I am just guessing at most of this stuff :)