SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Politics : Formerly About Advanced Micro Devices -- Ignore unavailable to you. Want to Upgrade?


To: Brian Hutcheson who wrote (45535)1/12/1999 1:04:00 AM
From: Petz  Read Replies (1) | Respond to of 1573074
 
Brian, doubling bus width. Technically, this doesn't improve latency, the time to get the first instruction or piece of data, it only improves throughput. Thats why K7 with 200 MHz bus and DDR SDRAM looks good to me.

Petz



To: Brian Hutcheson who wrote (45535)1/12/1999 1:09:00 AM
From: Scumbria  Read Replies (1) | Respond to of 1573074
 
Brian,

get the data from RAM into L2 cache and that is a bottleneck

Increasing the bus width would have almost no impact on performance.

The jump from a 32 bit non-pipelined bus on the 486, to a 64 bit pipelined Pentium bus, was significant because of the small caches and the fact that it reduced linefill time from 8/16 clocks to 4 clocks.

Further increases in bus width or clock speed will not show the same kind of performance improvements. The most you could hope for with a 128 bit bus was a reduction of two clocks per linefill.

Scumbria



To: Brian Hutcheson who wrote (45535)1/12/1999 11:03:00 AM
From: Ali Chen  Read Replies (1) | Respond to of 1573074
 
Brian, <but you still have to get the data from RAM into L2 cache and that is a bottleneck>
No, it is not true.
Let's say the Winstone has 10 applications, 2MB code in each, plus
it manages about 80 MB of data during the benchmark. This is
roughly 100MB of data to load. With the real memory bandwidth
of 100MB/s (as measured by STREAM benchmark; P-II goes up
to 200), it will take only ONE SECOND to load all that
stuff into caches. Yet the net benchmark run-time is no
less than 10 minutes, or 600 sec. Therefore the data/code
loads you mention are not the bottleneck. In reality
a computer loads/stores a little bit more, but the
overall rule for caches still holds:
"load once, execute many". That's why the external
bandwidth requirements are minimized, as Scumbria-RYBA
noted.