Vince - Also, a gentleman named Haim Barad...
Resorting to name calling, eh?
Just kidding...
Anyway, let's assume that AMD's prefetch has the same opcode (I don't know if it really does or doesn't). The timing of prefetches can be a little tricky. Therefore, it's not so easy to just say that the benefit will be by a certain % in performance. However, I would assume that it would benefit.
BUT - HERE'S THE CATCH... even if the opcode is the same, there is a certain strategy that real app developers use when optimizing for a specific processor. The first thing to do is to check for the family/model of the processor using the CPUID instruction. Depending on the results, the software developer then usually loads the code appropriate for that processor.
This means that the vendor still has to write 3DNOW code. That means that he has to decide that 3DNOW code is worthwhile for him. If he has already decided that SSE code IS worthwhile (and consequently used prefetch instructions), it WON'T help the Athelon as he will have to load a 3DNOW version of his code. If he were to just run the SSE version of his code on an Athelon, he would run into trouble with SSE opcodes (undefined on Athelon, I suppose).
By the way (now that the "cache-tweaks" are out), one of the benefits of CuMine over Katmai is the number of load buffers. The number of buffers was increased from 4 to 6 (a 50% increase). This is significant as it helps to make prefetching more effective. I won't go into all the details, but it certainly does help.
Haim |