SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: Paul Engel who wrote (43750)6/11/2001 9:18:17 PM
From: combjellyRead Replies (1) | Respond to of 275872
 
"I suppose you have made extensive - but non-existent-simulations to verify these "hypothetical" results !"

I know you are not as dumb as you try to act sometimes. As you should be aware, if a NUMA system is built as is specified by the diagram that AMD likes to flash around, each processor is getting as much bandwidth as a PC100 system at a minimum. Could be that they are getting more, it all depends on the statistics of the access pattern of the other processors in the system. So the worst case yields a difference of maybe 15%. The 15% is what you see if you compare a PC100 Athlon versus a PC2100 (but registered) Athlon at the same clock rate. Because there is a high probability (greater than 90%, the average hit rate on the cache system) that each individual processor that has to go to main memory has the full bandwidth of the chipset, the difference is likely to be less. Now the data transfer rate to each of the north bridges is 1.6GBytes per second, so that the transfer rate across the HTT bridge would be about the same as the maximum data transfer rate of the memory itself, minus the latency of the HTT, which is an unknown at this time. But what the hey, say that the latency cuts the effective bandwidth down to half of the local northbridge, and say, for arguments sake, that all of the transfers are non-local. And, what the heck, the average hit rate of the L2 cache is down around 50% because the locality of the program/data is not real good. Still this puts the typical performance of each individual processor at something like like 80% of a typical Athlon PC2100 system. Assuming that a 1meg L2 cache system would have a hit rate of 100%, that would only put the performance of the system at 90% of a PC2100 system, or a 12.5% increase. So the wort case for a 256k or so system versus a 1meg l2 cache system would put the performance increase at 12.5% for a cost of, oh say $500 per processor. Using the hypothetical 512k L2 cache would likely reduce the difference to 8% or less, and note that this assumes the 1 meg give a hit rate of 100%, a dubious assumption at best...