To: greg nus who wrote (31861 ) 4/14/1998 4:51:00 PM From: Kevin K. Spurway Respond to of 1572371
All, here's a post I found over on the Betterchips website (www.betterchips.com) describing some of the advantages of the 3D element of K6-3D: Posted by JC on March 25, 1998 at 16:53:04: In Reply to: Re: 30% faster...are you sure? posted by Upsilon on March 25, 1998 at 14:53:19: : : Another reason why I'm still skeptical is that : : every other website but this one has said that : : the K6-3D doesn't have a pipelined fpu -- that's : : in the K6+3D. The 3D instruction set does some : : real nice things for the fpu, but only if software : : takes advantage of it. : As I understand it, assuming the 3D instructions are used : the data takes a different path and a pipelined FPU becomes : irrelevant. Now what will use these intructions? A lot. : DirectX 6 will have it, so that means that any D3D : accelerated game will use them. Rumor has it that 3Dfx is : adding support for the instructions in the next version of : glide, so all native 3Dfx games will use the instructions. : In addition, there a lot of game companies designing games : specifically customized for AMD's instructions. The : biggest is Quake 3, of course, although that's hardly to : only one. Okay, this is what (I think) we'll be able to do with the new instructions: 1. perform two operations at one time on half sized floating point numbers. Without pipelining, it still won't be too cool, I think. Take the MULT instruction. Pretend it has a latency of 8 and a initiation interval of 4. This means that, if you're pipelining, you can perform one MULT every 4 cycles. Now, with AMD3D (which, in this example, is basically MMX for the fpu), you can split the fp register into two fp's each half normal size, and you can calculate the same function on both, allowing for two MULTs every eight ('cuz of latency) cycles. But with the AMD3D method, in order to multiply on two numbers at the same speed as Intel's fpu, you have to operate on numbers half the size, and both numbers have to be given the exact same instruction (eg: "MULT both numbers by 4.5"). Luckily the K6 fpu has a shorter average latency than the PII fpu (2 cycles versus 3). This will help. But the pipelined fpu in the K6+3D will help more. 2. calculate a wicked-fast divide by first finding the reciprocal of the denominator and then multiplying it by the numerator. This will probably chop divides at least in half, and I think divides are unpipelined in Intel chips anyway. 3. use an iterative, very fast algorithm to find a number's square root to any precision we want. This is the best feature of the instruction set. If applied properly, you could chop by ten the time required to do a 3D transform (hey, wouldn't you like that...400fps in Quake!). But it probably won't be quite all that. : Bring the debate on! Just one more message after this... -JC