SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Politics : Formerly About Advanced Micro Devices -- Ignore unavailable to you. Want to Upgrade?


To: Road Walker who wrote (123380)8/31/2000 1:41:21 PM
From: Scumbria  Read Replies (1) | Respond to of 1570552
 
John,

P4 is clearly based on some very sound concepts. The deep pipeline, good branch prediction, wide vector processing, and high speed bus are all great ideas.

The problem seems to be in the implementation. A small low latency cache will cripple both IPC and clock speed. The double pumped ALU will also severely clamp the clock speed.

They should have accepted the IPC problems associated with a deep pipeline, and gone for super high clock frequency. It appears that they were torn between IPC and MHz, and achieved neither.

Scumbria



To: Road Walker who wrote (123380)8/31/2000 2:05:09 PM
From: kash johal  Read Replies (2) | Respond to of 1570552
 
John,

re P4 design approach

I think you may have a point here.

Most desktop apps run fine on a 500Mhz plus machine today.

So the REAL need will likely be in new apps that are likely to FPU intensive.

SSE2 seems pretty good as AMD will be copying it for sledgehammer.

Unfortunartelt for Piv the new apps that might need SSE2 are not "real issues" today.

Perhaps when broadband access is universal say 2-3 yrs it will seem like a smart move.

Unfortunately if PIV is a dog at todays apps and benchmarks it may be the wrong product at the wrong time.

In fact you can say similar things about the AThlon to an extent. Its FPU is awesome compared to PIII for certain benchmarks. But its integer per clock was SLOWER than K6-III's. And the oft talked about 200Mhz bus didn't do much either.

Couple PIV with large die size and dual rambus and we may have recipe for a disaster for the desktop market.

The good news for intel seems to be tualatin. Move the PIII to 512K cache, 400Mhz bus, and 0.13 micron and u probably have a pretty good chip.

Just my .o2.

regards,

Kash



To: Road Walker who wrote (123380)8/31/2000 2:10:35 PM
From: Petz  Read Replies (3) | Respond to of 1570552
 
John, I think P4's design is a mixture of sound engineering questionable marketing and (to quote Scumbria) poor implementation:
1. The marketers said MHz at any cost
2. The engineers said, "For multimedia we need high bandwidth and latency is not as important as it used to be"
3. Then the engineers said, "We're going to need more ALU's and FPU's to process the bandwidth.

The marketeers vetoed that idea -- it would have made the die size too big. (It WAS too big already) So the engineers settled for the double-pumped integer ALU. I wouldn't blame the marketeers totally. They should have been able to do what they did in less silicon, IMHO. This thing should not be over 200 mm so something else must have gone wrong.

A poor compromise because it lowers the maximum MHz on 0.18 to only slightly above what a copper 0.18 process (AMD's) can get to. And there's not much code, multimedia or otherwise that needs loads of integer number crunching power.

A double pipe ALU and a double pipe FPU are what was really needed.

So, the tremendous bandwidth will go to waste since there's not enough guts to process the data. I suspect that will be corrected on the 0.13µ version of Willy. In fact, maybe Willy was never intended for 0.18µ anyway and the only reason its even being sort-of-produced is bragging rights with AMD.

Add a few more functional units, put it on a 0.13µ copper process and eliminate the dependency on high latency RDRAM and Willy will be formidable. How long will that take?

Petz