SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD)
AMD 258.86+9.0%3:59 PM EST

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: Ali Chen who wrote (227237)3/2/2007 11:24:51 AM
From: fastpathguruRead Replies (1) of 275872
 
Nonsense???. What the hell a difference between 3.6 or 3.2 can make if the actual performance will not reach even 1/10th
of this?

...

I am forgetting nothing. It is you who is not realizing that only a very special algorithm can keep busy all 8 FP units all time, and this algorithm is definitely not all over the SPEC benchmarks. Then you will run into BW issues. You also apparently didn't get my remark that full utilisation of the FP units requires special compiler, which is unlikely to be near soon given current AMD financial state and limited resources. In the mean time don't make a mistake thinking that Intel would miss an opportunity to support their new SSE4 (or whatever) in current version of their home-grown compiler, and will arrive with even better public image about their performance.


WRONG. On basically all counts.

First: Bandwidth from the cache hierarchy to the exec cores has been doubled, the instruction fetch size has been doubled, the width of the SSE units is doubled. The only thing that wasn't doubled was main memory bandwidth, but that was already under-utilized.

Second: The double-wide SSE units DO NOT need new compilers to be fully exercised; AMD didn't introduce new double-wide instructions to the ISA; What will happen with the new exec resources is that the current ISA will be more completely supported. It was the K8's 64b-wide FP units that were hobbling performance by breaking 128b instructions into pairs of 64b instructions.

Third: Benchmarks that scale with cores will automatically be able to take advantage of 4-core chips by launching more threads, just like they do to exercise the extra performance of single- to dual-core and 1S to 2S to xS transitions.

Bottom line: Barcelona, with 2x cores and 2x FP units WILL be able to basically quadruple (minus scaling inefficiencies which are of lesser magnitude than the equivalent dual->quad Intel scaling inefficiencies) certain FP-heavy benchmarks versus a vanilla K8.

fpg
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext