SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: wanna_bmw who wrote (50852)8/12/2001 7:21:49 PM
From: combjellyRead Replies (1) | Respond to of 275872
 
"I'm also guessing that Hammer won't be much different than the K7 core with x86-64 and integrated Hypertransport channels"

According to what AMD has stated so far, the Hammers will have additional pipeline stages. However they claim the IPC will be higher, not really sure if that can be achieved in practice. Sure, the integrated memory interface can help, and maybe a larger L1 and/or a wider path to the L2, a better algorithm for the pre-fetch, etc., but I dunno. Since the current cores have 10 stages, a basic rule of thumb puts the Hammers at about 20% higher clock rate than Barton. So assuming your max. clock rate of 2.4GHz is accurate for Barton, the Hammers should get to around 2.8GHz, plus or minus. Of course, there could be some speed path limitations, but...



To: wanna_bmw who wrote (50852)8/12/2001 7:25:18 PM
From: Dan3Read Replies (1) | Respond to of 275872
 
Re: With aluminum, the capacitance of the wires causes a charging effect, which makes it take a longer amount of time before you get a stable logical value on the wire. With copper, this capacitance is reduced, and the wires themselves can switch faster.

You're confusing the move to copper with the move to SOI. SOI can offer dramatically lower capacitance, which, together with the lower leakage losses, and other beneficial effects of SOI like elimination of the "body effect" is why power consumption can be so much lower while performance increases.

The sources of increased SOI performance are elimination of area junction capacitance and elimination of "body effect" in bulk CMOS technology.
chips.ibm.com

Small surface areas have lower capacitance, and copper can allow for small surface areas, but there is no capacitance reduction from substituting copper for aluminum.

For identical line dimensions, copper and aluminum provide the same side-wall capacitance, but copper has lower resistance. Similarly, for copper and aluminum lines with equivalent resistance, copper lines can be thinner, resulting in lower side-wall capacitance.
google.com



To: wanna_bmw who wrote (50852)8/13/2001 12:37:05 AM
From: pgerassiRespond to of 275872
 
Wanna_bmw:

You forget to include the limitations of more pipeline stages. First there is a delay associated with the register needed between steps. As the amount of work done for each stage decreases, this adds to the latency of each stage and eventually becomes very significant. For example if the delay for temporary storage between stages (a sort of sample and hold circuit) is 100ps (ps = 1 trillionth of a second) and the operating frequency is 1 GHz, each stage has a latency of 900ps to do work, for jitter and 100ps for the interstage register. If you halve the work done on each stage, 450ps to do work and 100ps for the interstage nets 550ps per stage or 1.8 GHz not 2 GHz as one would assume. In the first part interstage takes 10% of the time but 18% in the second.

The second problem is pipeline stage imbalances. The latency is the longest any stage takes to complete its work, an amount to take care of jitter and the interstage delay. So if 10 stages get done in 300ps, 9 stages in 400ps and one takes 550ps, the maximum speed allowed is 550ps for all stages to get their work done and you have a 1.8 GHz pipeline even if, the average stage latency of 360ps gives a speed of 2.75 GHz. A balanced pipeline can make more out of the speed than an unbalanced one even if, it is slightly slower in average latency. Many designers have stated that the Athlon pipelines are well balanced. Also, the larger the number of stages, the harder to keep any sort of balance in latency.

Next is the amount of pipeline stalls or bubbles. This happens when a pipeline is about to do it work that stage but, something is not yet ready for that work. So the pipeline stalls from that point to the front of the pipeline. This creates a bubble of no work done for the period until that requirement is satisfied. Work is done to all the stuff in the pipeline after the idle stage. The longer the pipeline, the more likely this will occur. The really bad ones come from a mispredicted branch. These can stall 18 stages of a 20 stage pipeline, creating a very big bubble. These bubbles reduce the rate of work that any given pipeline will do per cycle. You can tell that this is very significant from the amount of work done to mitigate this by the CPU designers.

Lastly, there is a problem with unbalances in the width of a pipeline. If the instructions on what to do next are found in the trace cache of a P4, up to three things can be started at once. If a branch mispredict occurs and the next instruction has not been decoded in the trace cache, only one instruction is decoded per stage and 8 additional stages are added to the time before something is done. This shows that P4 is highly weighted to the back end. If there is a lot of mispredicted branches or where the working code set is larger than the trace cache, the IPC drops below 1, sometimes by quite a bit. Athlon can decode as fast as it can execute and it can do up to 6 things at once (with some limitations) per stage. This is also referred to as an imbalance in pipelines as a whole. It is easy to confuse one with the other except for context.

It is likely that some of the speed difference between P3 and Athlon is due to the well balanced design and the rest is due to process. Between the P4 and the Athlon, more is due to the very long pipeline, but the limitations keep it from being 2.8 times faster in clock and perform much worse in both IPC and overall performance.

Pete