SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: Petz who wrote (5244)8/16/2000 1:54:39 PM
From: pgerassiRead Replies (1) | Respond to of 275872
 
Dear John:

Re: Best Case?

If all the program fits into the L1 that is the best case for the P4 over the P3, not the worst case. In business benchmarks, the bottlenecks are outside the CPU. That is why a large CPU horsepower increase does not change the benchmarks by that much. This means that even a 2G P4 will not have much higher a score than a 1G P3 and the breakeven will occur at a higher ratio than 1.4. The best programs for the P4 are simulations and games where the data quantities are large and the programs are small so that the trace cache rarely misses. If the trace cache does miss, whenever there is a call or branch will cause a 8 cycle wait while the trace cache begins to fill with decoded instructions. This would cause even a larger ratio. Assuming that SSE2 execution is not bungled and the FSB can keep the CPU fed, the trace cache almost never missing is where the breakeven might get down to 1 to 1.

Unfortunately, this is also the type of applications that the Duron, Tbird, Mustang, and especially the Sledgehammer excel. Do you think that P4 could stay with a same clocked Tbird? A Mustang? It would not even be a chance P4 could stay with Sledgehammer at same clock, possibly even >2 times clock. What do you think that the minimum clock ratio must be for P4 against Duron, Tbird, Mustang, and the "Hammer" series to be saleable?

IMHO the real reason for P4 delays is that they are having trouble getting the clock high enough to make up for the disadvantage of the lower IPC. They keep trying small changes (more silicon respins) to better balance the pipe stages and hope to find that sweet point that overall performance is best (hopefully over the profitably breakeven point).

Pete



To: Petz who wrote (5244)8/16/2000 1:55:41 PM
From: kash johalRead Replies (1) | Respond to of 275872
 
Petz,

re: Willy release

I can see the 10-15% decrease in IPC.

However rumors are that willy won't even be offered at 1.3Ghz but start at 1.4Ghz to 1.5Ghz.

If true it likely means that vast majority of parts will yield above 1.4Ghz.

The bad news is that sweet spot is likely to be > 1.5Ghz.

Speed distribution is usually at least +- 15-20% around the sweetspot.

Clearly with low initial volumes they would offer parts at lowest speed splits and ramp speed as volumes increase.

As willy volumes ramp yields a 1.6-1.8Ghz willy are likely.

So overall a pretty good solution considering that PIII gives up ghost at 1.1Ghz.

I think mustang at 1.5Ghz is likely coming with AMDs 0.13 front end.

If AMD has added SSE2 then mustang should really kick butt on the ole benchmarks.

Interesting few months coming up.

regards,

Kash



To: Petz who wrote (5244)8/16/2000 3:41:08 PM
From: jcholewaRespond to of 275872
 
> SSE2 includes multiply/accumulate instructions. Is is possible that P4 can do this in one cycle?

Hmm. There's PMADDWD. But I should note that this is an integer operation that works on MMX or XMM registers. I'm uncertain as to whether or not there is an SSE or SSE2 floating point mac instruction. It probably would be called something like MADDPS or MADDPD, but I didn't find anything like that in the instruction set reference for Willamette/P4 ( developer.intel.com ). If I'm mistaken, please let me know (the old eyes wear thin after a time, and I seem well past my prime).

Anyway, I was pointing that out because the message to which you replied referred primarily to floating-point computation.

Speaking of an integer mac. Hmmm ... can it be done in one cycle? Sure, I guess it's possible. But I think it'd have to be a pretty complex stage that could hurt the frequency ramp, so it might not be wise to have a one cycle mac. Of course, integer operations are much less complicated than fp ops, so I may be having my thoughts swayed by originally thinking about fp stuff (in other words: "I haven't a clue").

-JC