Anand: Barcelona vs. Xeon/"Stoakely"/Harpertown and Xeon/"Bensley"/Clovertown.
intro: anandtech.com configuration: anandtech.com
Believe it or not -- Barcelona more than holds its own, especially in server benches.
Setup info -- used a "B2" stepping of Barcelona, which was found to have only slightly better memory performance.
He (and 3 Intel consultants) couldn't get DDR2-800 FBDIMMs to work with either Intel platform, so 8GB (4x2GB) Crucial DDR2-667 CL5 ECC (FBDIMMs for Intel) was used for all 3 platforms. He also couldn't get the 45nm Harpertown CPU's working on the older Bensley platform, despite the latest BIOS and Intel's webpage, which says it should work.
The 2.5 GHz Barc was running at only 1.2v.
Latency, bandwidth, memory testing Anand did a lot of testing of L3 and found its latency was 43-48 cycles vs. 130-170 for main memory. The Asus KFSN4-DRE motherboard uses 2 HT links between sockets to improve performance, but only 1 is coherent. He measured a 50% improvement in L2 bandwidth for Barc 23XX vs. Opteron 22XX. Bandwidth of each Barc L2 is almost equal to Intel's shared L2 bandwidth. Barc single thread memory bandwidth is 26-50% higher then the Intel platforms. Barc multithreaded Stream bandwidth is 36% higher than 22XX Opterons. Barc 23XX multithreaded memory bandwidth is 223% of Xeon Stoakley platform.
LINPACK GFLOPS(matrix size 30,000) GFlops Xeon 5472 3 GHz 60 Xeon 5365 3 GHz 57.1 Opteron 2360SE 2.5 GHz 55 Xeon 5345 2.33 GHz 53.9 Opteron 2350 2.0 GHz 46.2 Opteron 2350 (Intel compiler) 43.7 At first, this looks bad for AMD. But notice that Intel scales very poorly with frequency, while AMD scales well. A 29% increase in Intel clock speed only buys 6% of performance. A 25% increase in AMD clock speed buys 19% of performance. Anand mentions that Intel just released a newer compiler version and math library version that increases performance, but apparently he couldn't get ahold of the library source code and he did not use it. That's OK, the PGI compiler wasn't the latest version either, since the latest SPEC scores are using version 7.1 and he used version 7.0.7. Another thing to notice is that the Barcelona runs almost as well using the Intel libraries and compiler. In fact, for matrix sizes of "only" 5000, the Barcelona runs 14% faster on the Intel compiler/library than on the "optimized" PGI/AMD Core Math Lib version!
Rendering (3DSMax) (32-bit Windows) 3DSMAX runs almost perfectly from L1 and L2 cache, both instructions and data, so superior memory latency and bandwidth can do nothing for Barcelona. 3GHz E5365 (Clovertown) has a 23% lead over 2.5 GHz Opteron 2360SE. The lead increases to 33% with Harpertown. The only possible quibbles with this test are whether everything in 3DSMax is always so cache-bound, and the use of 32 bit Windows.
Software Rendering: zVisuel (32-bit Windows) zVisuel gives Intel an unfair advantage since it is, "a benchmark which is very SSE intensive and which is optimized for Intel CPUs." (Anand, original Barc review.) Barcelona nevertheless beats Clovertown. E5472 3.00 GHz 124.8 2360SE 2.5 GHz 107 -- 25% clock buys 25%+ performance (B2 stepping?) E5365 3.00 GHz 99.9 -- 29% clock buys 15% performance E5345 2.33 GHz 86.5 2350 2.00 GHz 85
Anand reports additional tests which add a software antialiasing filter designed around the Intel "super shuffle engine." Read it yourself to see the predictable results. It still can't come close to the Barcelona's perfect scaling w.r.t. clock speed.
SPECjbb2005 Linux 64 Performance, Sun Java VM Suddenly, for this one test, Anand decides to include an 800 MHz FBDIMM result. It took us hours, but we managed to complete one run of SPECjbb with faster 800MHz DDR. How sleazy can you get? It barely beat Opteron 2350's anyway. He also managed to get Clovertown to work in the "Stoakley" platform. Opteron 2360SE 2.5 95198 Xeon 5472 Harpertown 3.0 w/800 MHz memory 91349 Opteron 2350 2.0 90273 Xeon 5472 Harpertown 3.0 85948 Xeon E5365 Clovertown 3.0 on Stoakley 81934 Xeon E5365 Clovertown 3.0 on Bensley 74145 Xeon E5345 Clovertown 2.33 on Bensley 73035 Yikes! The Intel results are clearly memory starved and wouldn't catch up to Barcelona 2GHz if they ran at 4 GHz.
With the BEA JRockit JVM, the only way Intel Harpertown can beat Barcelona 2.5 GHz is by TURNING OFF hardware prefetch. But only do this on the Stoakley platform, not Bensely, OK? Bensley platform loses even to 2 GHz Barcelona and there is nil frequency scaling on the Intel platforms.
Barcelona also wins MYSQL Linux and WINRAR. I didn't read the last benchmark page.
Petz |