SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: Joe NYC who wrote (76509)4/4/2002 2:51:29 PM
From: combjellyRead Replies (2) | Respond to of 275872
 
"it is in direction of benchmarks becoming a glorified stream benchmark."

But even in this scenario, ClawHammer will have it's advantages. A lot of attention has been paid to the whole chain, not only will latency to memory be better, so will latency to L2. Assuming I am reading the slides properly, it looks as if the latency to L2 will be 8 cycles as compared to 20 some odd for the current Athlons. If they went to that effort, they also likely made the L2 cache a true dual ported one and likely increased the bandwidth to the L2. With L1 cache evictions able to be run in parallel with cache reads, combined with a possibly improved pre-fetch and higher bandwidth to L2, then the data starvation that the current Athlon core has is going to be lessened. And that will have a big impact on SPEC, especially SPECfp. AMD was estimating a SPECint score of 1400 at 2GHz, so I suspect that Hammers will do just fine on STREAM-like benchmarks.



To: Joe NYC who wrote (76509)4/4/2002 8:21:39 PM
From: pgerassiRespond to of 275872
 
Dear Joe:

You are trying to see future Intel's CPUs and memory without allowing AMD to improve. Currently there exists only PC800, not PC1066. 100MHz QDR, not 133MHz QDR. But PC3200 is available for purchase now. DDR-I/400 attached to Clawhammer equals P4's entire 100MHZ QDR bandwidth. At 133MHz QDR, if and when it is released may allow P4 to have higher theoretical bandwidth, but a simple change of DDR-II support and even that goes away (PC4800). For 2 way, Clawhammer will have higher theoretical bandwidth 5.4GB/sec (for dual PC2700, dual PC3200 has 6.4GB/sec), even if half of it has 1 added cycle of latency. 2 way NW still sees only 4.2GB/sec of bandwidth.

All this supposes that latency adds or loses nothing. I do not subscribe to that being the case. Single channel PC2700 will outrun dual PC1066 or dual PC2100 in actual bandwidth available. Using most applications where the needed bandwidth is less than half that (<1GB/sec typ), latency is the sole criteria because once bandwidth needs are met, more does nothing. getting it faster always speeds it up. With typical FSB/NB delays, 2 cycles (1 each way) over 9 cycles (for 1T 2-2-2 DDRDRAM) means 22% speed up of on chip (worst case, 5 cycles is best case of 42%) versus NB without contention being added (which is not caught with current memory bandwidth tests). Adding contention just adds to the advantage of on chip (another cycle or two to a total of 34% to 46% (80% best case)). These are with DDR. One more thing a cache line fill on NW is 128 bytes twice that of Clawhammer. Without locality, that hurts NW latency by adding at least a cycle even with dual channel or 5 cycles with single channel.

So you may worry until Clawhammer comes out, but I think you will find that Clawhammer will be fast enough to outrun NW in all, but for a tiny fraction of tests mostly synthetic. Besides how much do you think people would pay for dual DDR over single DDR on the desktop? Would that pay for all of the additional MB/CPU costs (pins, packaging, die size, slots, traces, layers, etc)? Including maintaining gross margin?

The problem is solved for workstations since they would use the 2 way version which is dual DDR by definition or Sledgehammer Nx2xDDR.

Lets look at die sizing. Clawhammer has 1 core, 512KB of L2, 128KB L1, single DDR, 4 way HT switch in 104mm2. Your Sledgehammer (I call it Clawhammer DP, my SH is DCDC (dual core dual channel)) has 1 core, 1MB L2, 128KB L1, dual DDR, 5 way HT switch in 104mm2 + 512KB + 1 DDR + 1 HT way. 192KB on 0.18u is 24mm2 (difference between Duron and AXP) or 8KB/mm2 on 0.18u. On 0.13u that should be 16KB/mm2, so 32mm2 should cover 512KB (I think this will be lower due to some of the difference is other than L2 cache (layout blank area IMHO like current NW) of truly 15mm2 = 13KB/mm2 = 26KB/mm2@0.13u adding only 20mm2), 4mm2 for 1 DDR and 1mm2 for a HT link. Total 141mm2 (129mm2 at higher density estimate).

Trying another way, 512KB ECC is equal to 4718592 bits or 4.8mm2 per um2/bit SRAM cell size. Multiply by 3 to add busses for power, address and data plus logic gets 14.4mm2 per 512KB ECC L2. Adding this estimate to the die size above nets 123.4mm2, my original estimate.

Backtracking yields 104mm2 - 4mm2(DDR) - 2x1mm2(HT) - 14.4mm2(512KB ECC L2) - 3.6mm2(128KB ECC L1) - 4mm2(HT switch and link logic) equals 76mm2 for a core. SH DCDC makes 2x76mm2(cores) + 4mm2(HT switch and link logic) + 2x3.6mm2(L1s) + 2x28.8mm2(L2s) + 2x4mm2(DDRs) + 3x1mm2(HTs) = 231.8mm2 (so I was a little off).

The low amount of Durons are probably due to MB costs which are coming down and the Intel price war which lowered AXP prices. We will know more wrt to Durons at Q1 CC soon. Much depends on the direction of Duron unit sales. Up and AMD's decision is makes a lot of sense till Flash goes great guns again, down and they should reconsider Duron on the desktop.

To your Clawhammer performance point, I think that Intel will have to speed up NW to more than 3.7GHz to match a 3400+ rated Clawhammer in 32 bit mode, remember NW will no longer have a SSE2 advantage to fall back on (assuming others make no more WME 7.01 like errors). Prescott may make it there, but by then Clawhammer will be faster and using DDR2 by then (roadmap of 4400+ at last release). Besides Clawhammer will not be the flagship beyond Q2-03, that is for Sledgehammer to be. All of the benchmarks will be between SH and Prescott, CH and NW or Barton and P4 Celeron. Clawhammer will be fine against NW at same time. In 64 bit mode, CH will rule even against Prescott Yamhill and SHG just adds icing onto the cake. Beyond 1 way, all Hammers will rule over Yamhill unless Intel takes drastic measures trying something new, not P4 or IA-64.

Would you do anything more with Athlon, if Hammers either come early or beat the current assumptions on time? Once this is assured, AMD probably will change its strategy. A definitely good CH, SH and SHDC may make Athlon go away like AXP took over from Tbird. If the comparisons match SHDC vs Intel's flagships, SH vs NW and CH vs Celeron, will you accept CH as is (allowing upgrades to mainstream memory type)?

Pete