SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: Ali Chen who wrote (227237)3/2/2007 11:05:54 AM
From: combjellyRespond to of 275872
 
"Then you will run into BW issues. "

True. But K8 isn't able to use all the available bandwidth made available when they moved to the new sockets. So...



To: Ali Chen who wrote (227237)3/2/2007 11:24:51 AM
From: fastpathguruRead Replies (1) | Respond to of 275872
 
Nonsense???. What the hell a difference between 3.6 or 3.2 can make if the actual performance will not reach even 1/10th
of this?

...

I am forgetting nothing. It is you who is not realizing that only a very special algorithm can keep busy all 8 FP units all time, and this algorithm is definitely not all over the SPEC benchmarks. Then you will run into BW issues. You also apparently didn't get my remark that full utilisation of the FP units requires special compiler, which is unlikely to be near soon given current AMD financial state and limited resources. In the mean time don't make a mistake thinking that Intel would miss an opportunity to support their new SSE4 (or whatever) in current version of their home-grown compiler, and will arrive with even better public image about their performance.


WRONG. On basically all counts.

First: Bandwidth from the cache hierarchy to the exec cores has been doubled, the instruction fetch size has been doubled, the width of the SSE units is doubled. The only thing that wasn't doubled was main memory bandwidth, but that was already under-utilized.

Second: The double-wide SSE units DO NOT need new compilers to be fully exercised; AMD didn't introduce new double-wide instructions to the ISA; What will happen with the new exec resources is that the current ISA will be more completely supported. It was the K8's 64b-wide FP units that were hobbling performance by breaking 128b instructions into pairs of 64b instructions.

Third: Benchmarks that scale with cores will automatically be able to take advantage of 4-core chips by launching more threads, just like they do to exercise the extra performance of single- to dual-core and 1S to 2S to xS transitions.

Bottom line: Barcelona, with 2x cores and 2x FP units WILL be able to basically quadruple (minus scaling inefficiencies which are of lesser magnitude than the equivalent dual->quad Intel scaling inefficiencies) certain FP-heavy benchmarks versus a vanilla K8.

fpg



To: Ali Chen who wrote (227237)3/2/2007 11:56:59 AM
From: PetzRead Replies (1) | Respond to of 275872
 
re: It is you who is not realizing that only a very special algorithm can keep busy all 8 FP units all time, and this algorithm is definitely not all over the SPEC benchmarks.

No, it is you who is forgetting that OVER THE ENTIRE SPEC RATE BENCHMARK, "result" scales almost linearly in number of CPU's on Opteron processors.

4 x DC ~= 2* (2xDC) ~= 4* (1xDC) ~= 8*SC

Petz



To: Ali Chen who wrote (227237)3/2/2007 6:23:02 PM
From: pgerassiRead Replies (1) | Respond to of 275872
 
Dear Ali:

You also apparently didn't get my remark that full utilisation of the FP units requires special compiler, which is unlikely to be near soon given current AMD financial state and limited resources.

Out and out lie! Its already available and has been for a year, Sun's Studio 11 C compiler. You can see its benefits in SPECfp_2000. The more cores you add, the higher the SPECfp_2000 score got. It has a higher score than any C2D product.

aceshardware.com

It got 3538 for 4 Opteron 856s at 3GHz. The highest C2D got was 3056. It was beat by a single 2.8GHz Opteron 1220. The Opterons even beat Itanium. The only CPU that wasn't beaten was a 2.3GHz Power 5+ and the 2 DC 2.1GHz Power 5+ (4 cores) was the top at over 4K. FYI, IBM also has a parallizing C compiler to get the above. So those kind of compilers are already available.

Now take your medicine and agree that you were WRONG!

Pete