SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: DRBES who wrote (2838)7/29/2000 9:27:56 AM
From: ScumbriaRead Replies (1) | Respond to of 275872
 
DARBES,

its very deep, 20 stage, pipeline may be a programmer's nightmare since it obligates timing of calculation parallels to nightmare proportions.

There are very simple dataflow algorithms in use which make the problem of forwarding data in a deep pipeline manageable. Basically, each renamed result register is given a tag, which is snooped by the rest of the pipe as a potential operand source.

Most x86 operands come from memory rather than the register file, so a similar snooping scheme is required between the store queue and the cache.

It would be difficult to manage the state of a deep pipe through old fashioned centralized control, but dataflow techniques make deep pipelines quite manageable.

Scumbria



To: DRBES who wrote (2838)7/29/2000 3:52:43 PM
From: RDMRead Replies (2) | Respond to of 275872
 
<its very deep, 20 stage, pipeline may be a programmer's nightmare since it obligates timing of calculation parallels to nightmare proportions.>

Any knowledge of the the pipeline by a programmer is not changed whether the pipeline is 12 or 20 or 28 stages. All of these are equivalent on the nightmare scale (infinite) in trying to hand-optimize assembler code or hand-tune C code.

My surprise from the 1.4 GHz Willamette< 1.0 Ghz P3 integer results (these are the most important benchmarks for many common applications) is that despite being clock fast that it did not prevail. My conclusion, applying Scumbria's Law, is that either the Willamette is capable being clocked faster than 1.4 GHz or the there is some kind of optimal pipeline depth for a given design.

My remaing question is what is the strenght of the part since is does not doe integer well? The chip size of the Willamette is greater, the power is much greater, the cost will be high (Pentium Pro style). What good is it? Will it do double precision floating point faster than Athlon (for workstations)?