To: Ali Chen who wrote (227214 ) 3/2/2007 8:52:23 AM From: DDB_WO Read Replies (3) | Respond to of 275872 Ali - Amato said 3.2X *) You made a good point about IPC:Speaking about theoretical constructions, I remember AMD presentations of K6 capabilities to run 4 instructions per clock. In reality, this number rarely exceeded something like 0.2. This is a typical range of disconnect between an architect world and reality. I think the statements from AMD marketing officials are very irresponsible. This is, what most people seem to miss: that IPC is not the max. sustainable throughput of a CPU's decoding/execution/retirement, but of the system as a whole. If there are too many cache misses, mispredicted branches etc., a wonderful assumed IPC of 3 or 4 might drop down sharply. I just remember Mitch Alsup stating an (assumed, but likely close to the mark) IPC of 1.0 for Opteron in a usenet discussion. E.g. (for all, not you since you understand this concept) if a bunch of code (say 10 instructions) relies on some data, which isn't in the L1, then the code stalls for e.g. 14 cycles until it could continue. If execution has a throughput of 3 IPC, then this sequence would take 14+ceil(10/3)=18 cycles on the 3 IPC machine and 17 cycles on the 4 IPC machine. That's the reason, why OOO loads, improved prefetchers (+those in the NB), 32-way L3 cache, separate memory channels and so on are important in this equation, while things like LZCNT/POPCNT won't do anything for Barcelona in the first quarters. BTW, Scientia from AMDZone also listened to this interview (completely, while I didn't) and found:One interesting comment was that Intel's current memory access latency is 100ns while AMD's is 70ns and he suggested that AMD will be at 55ns soon. I assume he was talking about K10. I'm now wondering where the improvement will come from. I wonder if this could be related to the new direct to L1 prefetch versus the older load to L2 prefetch. Here I just think, that 15ns (30+ cycles) would be too much just by saving the L2 access for prefetched data. *) At 1/3 of full length of the video interview.