SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD)
AMD 246.76-0.5%Nov 14 9:30 AM EST

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: Ali Chen who wrote (227214)3/2/2007 8:52:23 AM
From: DDB_WORead Replies (3) of 275872
 
Ali - Amato said 3.2X *)

You made a good point about IPC:
Speaking about theoretical constructions, I remember AMD presentations of K6 capabilities to run 4 instructions per clock. In reality, this number rarely exceeded something like 0.2. This is a typical range of disconnect between an architect world and reality. I think the statements from AMD marketing officials are very irresponsible.

This is, what most people seem to miss: that IPC is not the max. sustainable throughput of a CPU's decoding/execution/retirement, but of the system as a whole. If there are too many cache misses, mispredicted branches etc., a wonderful assumed IPC of 3 or 4 might drop down sharply. I just remember Mitch Alsup stating an (assumed, but likely close to the mark) IPC of 1.0 for Opteron in a usenet discussion.

E.g. (for all, not you since you understand this concept) if a bunch of code (say 10 instructions) relies on some data, which isn't in the L1, then the code stalls for e.g. 14 cycles until it could continue. If execution has a throughput of 3 IPC, then this sequence would take 14+ceil(10/3)=18 cycles on the 3 IPC machine and 17 cycles on the 4 IPC machine.

That's the reason, why OOO loads, improved prefetchers (+those in the NB), 32-way L3 cache, separate memory channels and so on are important in this equation, while things like LZCNT/POPCNT won't do anything for Barcelona in the first quarters.

BTW, Scientia from AMDZone also listened to this interview (completely, while I didn't) and found:
One interesting comment was that Intel's current memory access latency is 100ns while AMD's is 70ns and he suggested that AMD will be at 55ns soon. I assume he was talking about K10. I'm now wondering where the improvement will come from. I wonder if this could be related to the new direct to L1 prefetch versus the older load to L2 prefetch.

Here I just think, that 15ns (30+ cycles) would be too much just by saving the L2 access for prefetched data.

*) At 1/3 of full length of the video interview.
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext