SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: Tenchusatsu who wrote (17595)11/3/2000 8:50:21 PM
From: dougSF30Read Replies (1) | Respond to of 275872
 
Tench, Re: can you show me a heavy-duty application which crunches the FPU but not bandwidth?

First of all, that's a very different statement from your previous one:

Thus, it's pointless to try and draw the line between bandwidth and floating-point capabilities, because one without the other is meaningless

Imagine various applications that perform floating point manipulations on large sets of data. Depending on the amount of FPU calculation required per unit of data, the bottleneck could be very much the fsb/memory bandwidth OR very much the FPU.

Case 1: Lots of FPU per unit data. If the FPU is crunched, going to DDR or i850 won't help at all.

Case 2: minimum FPU per unit data. Increasing the FPU speed will be of no consequence, since the bandwidth is the bottleneck.

Changing screen resolutions in Quake III alters the FPU per unit "data" required.

Of course you need "both" (FPU performance & bandwidth) in most applications, but the relative degree to which you need one or the other varies tremendously.

Doug



To: Tenchusatsu who wrote (17595)11/3/2000 10:29:36 PM
From: minnow68Read Replies (1) | Respond to of 275872
 
Tenchusatsu,

You wrote "can you show me a heavy-duty application which crunches the FPU but not bandwidth?"

Yes.

An example is multiple linear regression. The problem is related to the square of the number of variables involved. Therefore, if you are performing a regression with less than 178 variables (very few problems use that many), then the entire accumulation array can be held in a 256K L2 cache. This means that for a regression of 178 variables, one would have to perform about 64K floating point ops for each 1424 bytes of memory read. Another way to look at this is the flops/(bytes of bandwidth) ratio is about 45.
This problem is of extreme practical interest. Regression is used extensively in many fields.

Similarly, there are other mathematical problems were complexity expands even faster. For example, flops needed in regression is proportional to the square of the number of variables. There are problems where calculation time scales to the cube or worse. For these problems, the bandwidth does not help much as long as the working set fits in cache.

Mike