To: Petz who wrote (4198 ) 8/9/2000 7:33:58 PM From: Bilow Respond to of 275872 Hi John Petzinger; Re estimating DDR performance by doubling the delta between PC100 and PC133... This is actually pretty complicated, which of course is why I asked for opinions on it. The situation with the Nvida GeForce2 is simple in that you can overclock both the processor and the memory separately. With them, it is very clearly a memory bandwidth issue, not a processor speed bottleneck. In addition, they don't seem to have the issue of CL2 vs CL3, which complicates PCs, nor is that analysis taking into account the difference between DDR and SDRAM. One of the problems with scaling this stuff is that the scaling laws for bottlenecks (and overall system performance) don't hold when the circuit topology changes. For instance, increasing the size of cache will decrease the dependency on memory, (and FSB). The one thing you can do is figure out how much performance is limited by the FSB / memory. You can do this by comparing the performance advantage of increasing the FSB / memory bandwidth to the advantage of increasing the processor MHz. We all agree that if you speed up every process in a computer by 10%, you will end up with a machine that goes 10% faster. So you compare a 1GHz machine with an FSB of 100MHz to a 1.1GHz machine with the same FSB. That gives you the CPU contribution to performance. The difference will be less than 10%, maybe, I don't know, say 7%. (If I did know, I wouldn't be asking.) That implies that memory bottleneck is responsible for about 30% of the overall system bottleneck, and that therefore improving the memory performance by 10% will increase system performance by about 3%. But when we go from PC133 to PC2100, we are not doubling memory performance. We are only doubling memory bandwidth, while latency stays the same. So then you have to figure out how much of memory performance is latency limited and how much is bandwidth limited. You figure that out by making four performance tests: PC100 CL2, PC100 CL3, PC133 CL2, PC133 CL3. (If the world were a perfect place, you would be able to predict the fourth measurement from the first three. In fact, that is how you can tell how good the model is.) Now write the performance equation: Ttot = Total time required to complete task. Tcpu = Time spent by CPU. (i.e. not waiting on memory.) Tmem = Time spent by memory. Tlat = Time spent waiting for memory latency. Tmbw = Time spent waiting for memory bandwidth. Ttot = Tcpu + Tmem = Tcpu + Tlat + Tmbw. (Note that the above descriptions are not meant to be interpreted literally, they are only markers for the system performance due to the various parameters.) As an example, if the cpu is 70% of the bottle neck (as I implied with my total guess a few paragraphs ago), then this equation, for the 10% performance improvement from 1GHz to 1.1GHz would be: 1GHz system: Tcpu = 0.70, Tmem = 0.30 1.1GHz system: Tcpu = 0.63, Tmem = 0.27 This gives the four possible system speeds for 1GHz and 1.1GHz CPUs, and 100MHz and 110MHz FSBs. (Though some of these combinations may not be possible to implement in real world.) 1GHz w/ 100MHz FSB = 0.7 + 0.3 = 1.0 seconds (+0%) 1GHz w/ 110MHz FSB = 0.7 + 0.27 = 0.97 seconds (+3%) 1.1GHz w/ 100MHz FSB = 0.63 + 0.3 = 0.93 seconds (+7%) 1.1GHz w/ 110MHz FSB = 0.63 + 0.27 = 0.90 seconds (+10%) You do the same thing, but with the addition of two more varieties of memory, and you can get some sort of estimate for DDR. DDR only improves the Tlat, not Tcpu or Tmbw. Because of all this, I tend to agree with Ali that DDR will provide only a couple percent performance improvement, at best, for single processor machines running relatively low frequencies. To reall see DDR kick butt will require 4x multiprocessors or 2GHz and above. -- Carl P.S. Forgive me for typing this in so hurriedly, (and without much proof reading) but I have to go back to my regular job...