To: ptanner who wrote (73340 ) 3/5/2002 4:12:52 AM From: peter_luc Read Replies (4) | Respond to of 275872 PT, thread, Hammer performance estimate by Paul DeMone, RWT Discussion Forum Home See realworldtech.com "Topic: Hammer performance estimate Name: Paul DeMone (pdemone@igs.net) 2/28/02 I have been asked by several people for performance estimates of AMD's Hammer series of processors so I'll give it a shot. Keep in mind that x86-64 is a new microarchitecture so there is wide room for error on any or all of these factors: Reference point: - K7 XP 2000+ - at or near end of performance scaling in 0.18 um bulk CMOS - 1667 MHz, ~700 SPECint_base2k, ~600 SPECfp_base2k Hammer top bin clock rate (early/mature): - 5%/5% bump from 12 stage pipeline (extra stages mostly for IPC gain and for handling extra complexity of x86-64) - 20%/25% gain from 0.13 um (wire limitation, limited Leff reduction from late model 0.18 um K7s vs use of 0.09 um FET techniques) - 10%/15% gain from SOI Total +35% early, +45% mature Microarchitectural gains: - biggest difference is on-chip memory controller. If we assume best of class K7 chip sets average 100 ns access for ~50% page hit mix and moderate traffic and integrating the memory controller shaves 30 ns (probably a bit generous) off read latency, and integer app performance scalability is 60%, then speedup is approximately 1/(0.6 + 0.4*(70/100))or 19%. Assume othe efficiencies like better buffering and round that to 20%. For larger cache/wider memory Sledgehammer, I'll say 25% bump for integer apps. FP apps are much more bandwidth sensitive than latency sensitive so I'll apportion 5%/40% for Claw/Sledge. - The improved front end I'll apportion 5%/0% for int/FP apps. - for x86-64 compiled apps I'll apportion 5/10% for int/FP apps from increased number of GPRs available and other efficiencies. So "IPC" improvements relative to XP (with x86-64 recompilation): Clawhammer: int: 20% MC + 5% FE + 5% x86-64 = 30% FP: 5% MC + 0% FE + 10% x86-64 = 15% Sledgehammer: int: 25% MC + 5% FE + 5% x86-64 = 35% FP: 40% MC + 0% FE + 10% x86-64 = 50% SPECint/fp_base2k estimates (assume 70%/50% int/FP perf scaling with F) with full x86-64 recompilation: Early top bin (+35%, ~2250 MHz) Claw: 1150 / 800 Sledge: 1200 / 1050 Mature top bin (+45%, ~2400 MHz) Claw: 1200 / 850 Sledge: 1250 / 1100 If the Hammers are running generic or P4 optimized 32 bit x86 code then I would discard the x86-64 IPC bump and cut the FE bump in half. That will reduce the performance by about 6 to 8%. FWIW a 3400 MHz XP would probably score roughly around 1050 SPECintbase_2k so if Hammer's model rating number was based on SPECInt then a 3400+ Clawhammer would clock around 2 GHz. Conversely, a 2.25 GHz Claw would rate around a 4000+ rating. Now remember folks that is a 15 minute, back of an envelope calculation/estimate/WAG and 5 minutes was taken to find the envelope. ;-)" In a later post in the same thread he makes the following statement: "> >Where do y'all think the P4 will be by then? Here are my estimates/WAGs: P4/3000 400 FSB 1000/900 P4/2933 533 FSB 1100/1050 P4/3467 533 FSB 1200/1150 Intel should have the advantage until Hammer ships, then it will be tight again. IIRC the 0.09 um P4 ships in 2H03 and then Intel should once again have the advantage." What is your opinion about it? What do you especially think about his assumption that there will only be a 5% clock speed bump from the 12 stage pipeline (extra stages mostly for IPC gain and for handling extra complexity of x86-64)? If all AMD can hope for is to be on par with Intel for 2-3 quarters and then falling behind again, the future would not look too bright... OTOH, the messages on the RWT discussion forum are quite often slightly anti-AMD (or let's say definitely pro-Intel). Those who make positive remarks about AMD are often characterized as "AMD partisans" (or similar). Peter