fyo,
"Being an "fp scientist" myself, I can say that's NOT true. "
Well, sorry, but it is true for my application. The 1.5GHz P4 is faster than the 1.33 GHz Athlon and close to performance to the best RS6000 IBM chips (43P270 to be precise, Power3/370MHz) for a fraction of the cost.
"The P4 is also a bit sensitive when it comes to square roots and 1/x operations"
Most of my stuff is linear algebra, but there are plenty of dexp, erf, sqrt, etc.
"For the work I'm currently doing, I use relatively large datasets,"
The benchmarks I was quoting used less than 30mw (256 mbytes) of mem. Not sure if this is "small" or "large" for you ... It is small for me as I can surely use much more memory, eg, diagonalizing a 5000x5000 matrix.
"(regardless of optimization - all the inner loops are hand optimized in assembly)."
Actually, the P4 did quite well (without SSE2) using a Portland group FORTRAN compiler under Linux Red Hat, against heavily optimized IBM libraries (for matrix multiply, matrix times a vector) on the 43P270.
Anyhow, I thought the SpecFP score for the P4 was "cooked". I now have to concede that it performs extremely well for my application (even without SSE2). My suspicion is that most of the advantage may be memory bandwidth but I don't know for sure. It will be interesting for me to find out how the P4 performs with DDR for my application when it becomes available. We shall see.
Regards, fp |