SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: Charles Gryba who wrote (53591)9/1/2001 9:03:05 PM
From: wanna_bmwRead Replies (1) | Respond to of 275872
 
Constantine, I think Tench's point was not about the performance gains on individual specs, but rather to show that the bottlenecks in a system can't be as easily identified by comparing the specs of internal components as if you were trying to make an apples to apples comparison. Tench's response was mainly directed at Dan3's stupid comment that the Pentium 4 actually executes at half speed, which he's been trying to push on both Intel and AMD threads for the past couple days.

The difference here is that first, Dan is incorrect. The trace cache issues every other clock, but the decoder decodes on every single clock. Second, the trace cache issues twice as much on every other clock as the Athlon does per clock, so on average, they end up being nearly equal. The Pentium 4 IPC penalty comes mostly from instructions that have higher execution latencies as the Athlon. By raising the latency of certain instructions, Intel was able to simplify a lot of integer and floating point logic, which enabled them to design the long Netburst pipeline. But then again, Intel designed a new instruction extension, SSE, which enables them to design their execution simply without having to raise execution latencies. Check out the software specs for the Pentium 4, and you will see that SSE/SSE-2 instructions have lower latencies than the general purpose instructions, mostly because they don't have to follow the same strict rules of the IEEE FP standard, such as denormalized numbers.

There are trade-offs in many facets of CPU design, but the point is that people like Dan3 are not going to be able to catch a performance failure just by reading the specs, so his posts spreading FUD about the micro-architecture are completely pointless. Both Netburst and K7 follow their own design philosophies. One makes up for performance by clocking at very high frequencies, and one gets more performance out of processing data more efficiently. Which one ends up scaling better in performance relative to frequency in the future will depend on a lot of things, including software, and I think it's far too early to reserve judgement on that issue.

wanna_bmw



To: Charles Gryba who wrote (53591)9/1/2001 9:16:37 PM
From: TenchusatsuRead Replies (1) | Respond to of 275872
 
Constantine, <I thought the Athlon gained a LOT when it went from the off chip cache to the on-chip cache even though the cache sized was halved. So CombJelly's argument has some merit.>

The improvement came from the lower latency, not necessarily from the increase in speed.

Combjelly made it sound like the entire P4 processor actually slowed down at times. It doesn't.

Tenchusatsu



To: Charles Gryba who wrote (53591)9/2/2001 9:30:38 PM
From: fyodor_Respond to of 275872
 
Constantine: I thought the Athlon gained a LOT when it went from the off chip cache to the on-chip cache even though the cache sized was halved.

Actually, the increase in performance was very minimal, at least for the 1/2 speed cache chips (the 1/3 speed did exhibit a higher increase). AMD had some amazing (and expensive) off-die L2 cache - much better than what Intel sported. This was traded in for some&#133 mediocre&#133 on-die L2 cache (mediocre by on-die standards - it was still faster than the off-die cache, but the increase it cache speed was pretty much negated by halving the size).

-fyo