To: Magrathea who wrote (198065 ) 5/19/2006 10:15:55 PM From: pgerassi Read Replies (3) | Respond to of 275872 Dear Magrathea: You forget that NGA will not have a 4P variant this year and the platform won't be tested by server customers until 2008 at the earliest. The reason Intel went with dual FSBs for Blackford is that 4 cores on a single FSB don't get more than twice the performance of a single core. Most of that happens from 1 core to two (1.6x IIRC). With Opterons, its more like 1.9x from 1 to 2 and 3.4x from 1 core to 4. That is why Xeon MP is so slow versus Opteron 8xx. With dual core, Xeon gets that 1.6x for a single socket, if both were on one FSB, it would get only an additional 1.25x. With two, they get more like 1.6x. Thus with one FSB, 4 cores get 2x and two FSB, get 2.6x. They can't get 4S with DCs and one FSB, they need two. Thus 8 NGA cores are only about 3.3 times faster than 1 core. 8 core Opteron systems (4S DC) get about 5.6x that of one core. Lastly Intel can't use current implementations for 4S QC as only 4 cores are allowed on any given FSB. DS QC will likely get 3.2 times a single core with dual FSB. This is likely why they went to shared cache, so that they can put more than 4 cores on a FSB. Although they may now do a QC 4S dual FSB system, the FSB bottleneck itself is likely to limit performance to 5.2x even on the 16 core case. Virtualization will case this to decrease as the load becomes more disbursed (read much larger working sets and lower locality). Cache helps in focused server applications, but virtualization is countereffective as each active application added to the physical server increases the cache needs. And when cache needs are much above the amount of cache present, you get thrashing and major performance losses. Prefetching actually makes the onset of this much sooner and when thrashing occurs, prefetching just makes it far worse as the prefetched data is flushed before it can be used. It can get to a point where one prefetcher is fighting another and nothing gets done. So the chance of Dell cancelling 4S Opterons as remote. Intel will not have an effective solution for some time. Pete PS: If you think about the performance increases in the second paragraph, even if NGA has a 20% lead in 1S, it is a wash in 2S and a 30% deficit in 4S. PPS: Linpack is not a good server benchmark as it fits into cache and has very little communication between threads. Most HPC users ignore it (its not in either SPEC nor HPC benchmarks) unless, it uses matricies far larger than the total cache size (>8Kx8K arrays for 8MB cache 2x4MB (2xNGA)). Given the scores, the array size used was small.