JC,
Actually, SSE2 is without doubt a better floating-point instruction set than x87 is. Even without the advantage of SIMD, it lessens pressure on the decoder end (because you don't need to use FXCH to jumble around the stack). However, Intel's implementation of SSE2 is disappointing. See, it's essentially scalar in nature, as is Willamette's x87 and MMX implementations. If AMD implemented superscalar SSEx (and I'm not talking about like the double instruction pumping that P6 has to do for SSE) just like they have superscalar x87 and MMX, then SSE2 will become a much more attractive option to several coders that I know.
Of course, caches and memory will have to get a lot better to meet requirements of eight sustained 32-bit operands per cycle (or four sustained 64-bit operands per cycle) throughput. Heck, caches and memory still need to catch up with the original Athlon's superscalar x87, which is obviously underperforming its own specs!
I have no doubt SIMD FPU do way better than old x87, but who ever doubt that the Alpha CPU was better than any other x86 ones... the big issue is the compatibility. Look at the emotion engine which runs at a crappy 300MHz at throw up 10 times as much polygons than the most beefed up x86 setup.
This is Intel we're talking about. The company with three times as many x86 fabs as every competitor combined. If the lowest performing Duron beats the highest performing P4XP next year (incredibly unlikely, of course), then Intel will still dominate at least three quarters of the market.
By f*ck*d I meant that they wont be able to save their asses by having a better SSE support in some benches. There are no such scenario in my mind where AMD outship Intel. But if nano-AMD succeed to outperform giga-Intel for 1-2 years, then I'll be awesomely proud of them and they should see rapid growth.
Max |