SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: Petz who wrote (4198)8/9/2000 7:33:58 PM
From: BilowRespond to of 275872
 
Hi John Petzinger; Re estimating DDR performance by doubling the delta between PC100 and PC133...

This is actually pretty complicated, which of course is why I asked for opinions on it. The situation with the Nvida GeForce2 is simple in that you can overclock both the processor and the memory separately. With them, it is very clearly a memory bandwidth issue, not a processor speed bottleneck. In addition, they don't seem to have the issue of CL2 vs CL3, which complicates PCs, nor is that analysis taking into account the difference between DDR and SDRAM.

One of the problems with scaling this stuff is that the scaling laws for bottlenecks (and overall system performance) don't hold when the circuit topology changes. For instance, increasing the size of cache will decrease the dependency on memory, (and FSB).

The one thing you can do is figure out how much performance is limited by the FSB / memory. You can do this by comparing the performance advantage of increasing the FSB / memory bandwidth to the advantage of increasing the processor MHz.

We all agree that if you speed up every process in a computer by 10%, you will end up with a machine that goes 10% faster. So you compare a 1GHz machine with an FSB of 100MHz to a 1.1GHz machine with the same FSB. That gives you the CPU contribution to performance. The difference will be less than 10%, maybe, I don't know, say 7%. (If I did know, I wouldn't be asking.) That implies that memory bottleneck is responsible for about 30% of the overall system bottleneck, and that therefore improving the memory performance by 10% will increase system performance by about 3%.

But when we go from PC133 to PC2100, we are not doubling memory performance. We are only doubling memory bandwidth, while latency stays the same. So then you have to figure out how much of memory performance is latency limited and how much is bandwidth limited. You figure that out by making four performance tests: PC100 CL2, PC100 CL3, PC133 CL2, PC133 CL3. (If the world were a perfect place, you would be able to predict the fourth measurement from the first three. In fact, that is how you can tell how good the model is.)

Now write the performance equation:

Ttot = Total time required to complete task.
Tcpu = Time spent by CPU. (i.e. not waiting on memory.)
Tmem = Time spent by memory.
Tlat = Time spent waiting for memory latency.
Tmbw = Time spent waiting for memory bandwidth.

Ttot = Tcpu + Tmem = Tcpu + Tlat + Tmbw.

(Note that the above descriptions are not meant to be interpreted literally, they are only markers for the system performance due to the various parameters.)

As an example, if the cpu is 70% of the bottle neck (as I implied with my total guess a few paragraphs ago), then this equation, for the 10% performance improvement from 1GHz to 1.1GHz would be:

1GHz system: Tcpu = 0.70, Tmem = 0.30
1.1GHz system: Tcpu = 0.63, Tmem = 0.27

This gives the four possible system speeds for 1GHz and 1.1GHz CPUs, and 100MHz and 110MHz FSBs. (Though some of these combinations may not be possible to implement in real world.)

1GHz w/ 100MHz FSB = 0.7 + 0.3 = 1.0 seconds (+0%)
1GHz w/ 110MHz FSB = 0.7 + 0.27 = 0.97 seconds (+3%)
1.1GHz w/ 100MHz FSB = 0.63 + 0.3 = 0.93 seconds (+7%)
1.1GHz w/ 110MHz FSB = 0.63 + 0.27 = 0.90 seconds (+10%)

You do the same thing, but with the addition of two more varieties of memory, and you can get some sort of estimate for DDR. DDR only improves the Tlat, not Tcpu or Tmbw.

Because of all this, I tend to agree with Ali that DDR will provide only a couple percent performance improvement, at best, for single processor machines running relatively low frequencies.

To reall see DDR kick butt will require 4x multiprocessors or 2GHz and above.

-- Carl

P.S. Forgive me for typing this in so hurriedly, (and without much proof reading) but I have to go back to my regular job...



To: Petz who wrote (4198)8/9/2000 8:39:11 PM
From: pgerassiRead Replies (1) | Respond to of 275872
 
Dear John:

Re: Bandwidth testing

The best test would be a K6-3 at 66 MHz on a Super 7 motherboard with SDRAM CAS2 and the same clock K6-3 (remember K6-3 multipliers are unlocked) at 100 MHz with SDRAM CAS3. Both would have the same chipset, latency (133 ns), and CPU but the test at 100 MHz would have 50% more bandwidth. All parameters given are normal (no over or under clocking) for a given system. Any increase in speed (1 - runtime@66/runtime@100) would strictly be bandwidth related. Doubling the increase would be a good rule of thumb for the DDR transition.

Pete

PS: The K6-3 has the same size L2 (256K) as well.