SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: pgerassi who wrote (74576)3/15/2002 1:07:57 PM
From: Ali ChenRespond to of 275872
 
Dear Pete, I'm glad you finally starting to approach the idea. You rightly noted:
"Besides, if you look at your formula, you just proved what the post stated that began all this. Surely you must have noticed that even if the clock rate was infinite, the score would not go above a certain value (C1/C3). Thus, there exists a hard limit on performance. That that line is related to caches, memory speeds and latency."

All began from an assertion that everything "scales
as 10% performance gain for 50% clock increase".
First, the whole "%% speedup" approach is wrong since,
as you now see, it changes depending on the starting
point: the "%% speedup" in frequency domain goes smaller
as frequency goes higher and higher, and therefore is
not a reasonable measure of anything. I just wanted to
play the game all understand ("%% speedup") and stated
that the speedup from 2000MHz starting point is much
bigger, 20% for SPEC_fp, and 32% for SPEC_int, and
varies from application to application. As you
might see, there was quite a bit of background research
that allowed me to state that.

You may want to examine this piece of slightly awkward
description of "profiling ideas":
specbench.org

In particular, the abstract says:
"In the time domain, the corresponding dependencies appear to be strictly linear
for any statistically representative benchmark (like SPEC or Winstone).
Extrapolation of the runtime trendlines down to zero core clock period
(infinitely-fast CPU) gives basis for useful interpretation of system behavior. "


I am glad you found one of the useful interpretations
of the resulting formulas,
that there is a "memory wall" limit for the
particular platform architecture/implementation
.
I hope you understand that this limit may change
depending on FSB, memory speed, and overall off-chip
latency, for BOTH AMD and Intel systems.

"And that if Hammer reaches arround 2000, P4 will never catch up no matter how fast it clocks."

As you can see from the above, this conclusion might
be quite premature. All depends on at least three
parameters: the actual value of the "memory wall" limit
for each platform architecture, true IPC of CPUs,
and the attainable design frequency. So, the results
may vary and are still uncertain. (my $0.02 of FUD ;-)

- Ali