SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: Saturn V who wrote (53149)8/30/2001 1:08:49 AM
From: Ali ChenRespond to of 275872
 
SaturnV, "Since the Athlon has such a superior cache, Tier 1 Server vendors are falling over themselves to introduce Athlon servers."

While it is true that little is known about AthlonMP
cache performance on server applications,
it would be highly premature to conclude that cache
performance is the reason for less than wide Athlon server acceptance.

You probably also have very little idea how much
it cost to arrange for TPC benchmarking.



To: Saturn V who wrote (53149)8/30/2001 5:59:23 AM
From: pgerassiRespond to of 275872
 
Saturn V:

You are just as illogical since you failed to comprehend that the post covers the methods to use any given set to roughly calculate how much a certain size and way cache would have on a typical usage pattern.

The exact parameters of the L1 data cache are 128 cache lines of 64 bytes each in 32 sets of 4 ways each. The latency for a cache hit is 2 cycles for an integer data request and 6 cycles for a floating point number (no guidance is given as to what the latency is for a MMX or SSE(2) request) ( information is on Page 40 of 331 in the P4 Optimization manual at developer.intel.com ). The L2 cache has 2048 cache lines of 128 bytes each organized as a pair of 64 byte sub lines in 256 sets of 8 ways each. The latency for data is 7 cycles on a cache hit (on top of the 2 cycles to determine a L1 miss). between 132 to 252 cycles for a 2GHz P4 are required to request a memory access (the chip set will of course add to this at 20 cycles per bus cycle (100MHz)), if no bus contention is found. So a L2 cache miss would take at least 141 cycles minimum (testing shows that the number goes more towards being above 261 cycles).

Plugging in these numbers for integer code needs only the frequency of data memory reads measured in cycles per read. Using the estimate that 1 out of 4 integer operations requires going out to memory (2 cycles), L1 cache has a cache hit rate about 85% or so. This yields about 27 cycles per L2 request and a frequency divisor of 2.5 for L1. L2 has a cache hit rate at about 92% or so. For a frequency divisor of 2.0 for L2. The total frequency divisor is 5 for integer operations.

For x87 FP ops, there are 4 cycles per request. The L1 cache has about a 80% hit rate or so. This yields about 50 cycles per L2 request and a 2.8 frequency divisor. L2 has a cache hit rate of 92% or so. That yields about a L2 frequency divisor of 1.7 for a total FP frequency divisor of 4.7 for x87 FP ops. For SSE(2) ops, the frequency divisor is 4.6 for a total of 7.7.

Do the same for Athlon and you see it has smaller frequency divisors mostly due to the 8x larger L1 and exclusivity.

As to the server penetration, Intel took a lot longer than you must be remembering. Xeons took a long time to get penetration. It has been only 11 weeks since intro of AMD server systems. If they hit 15% in units in a year, they will have done it much faster than Intel did it.

Pete