SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Politics : Formerly About Advanced Micro Devices -- Ignore unavailable to you. Want to Upgrade?


To: dumbmoney who wrote (50639)2/23/1999 1:34:00 PM
From: Paul Engel  Respond to of 1572611
 
dumbmoney - Here is a better example of Clive Turvey's KNI benchmark tests.

Note the multiplication & division SPEEDUPS with KNI vs. P5 and P6.

Paul

{===========================}
tbcnet.com

Update 11:15pm 11-Feb-99 - Here are a few interim results, I'm having a little trouble with the compiler. I have been unable to get a KNI version of MFLOPS using single precision to work properly, so I'll take a slightly different tack. I have opted to stick with doing a few primitive operations on simple arrays. Division is one of the more complex floating-point operations, and KNI appears to provide a 3.4 X increase in throughput. It should be noted that 3DNow! doesn't offer a division operation, and instead you do a reciprocal of the divisor and then multiply. The KNI reciprocal functions have only a 12-bit accuracy, so there use in this fashion is probably inadvisable.

Optimized using instructions for P5....P6...... KNI

Cycles per addition______ 3.4688..... 3.9375....... 2.3594
Cycles per multiplication _______3.9141...... 3.9063....... 2.4141
Cycles per division________ 31.3828...... 31.4375........... 9.0469
Cycles per square-root_________ 82.3047...... 82.820.........31.0703 *

* The compiler didn't vectorize this as fully as it could have