SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Politics : Formerly About Advanced Micro Devices -- Ignore unavailable to you. Want to Upgrade?


To: Cirruslvr who wrote (120399)7/17/2000 1:22:59 AM
From: Epinephrine  Respond to of 1571373
 
RE:<RE: Thunderbird/Durons 64bit L2 cache data path>

Cirruslvr,

I am not trying to presume to answer for Scumbria but for what it's worth I think I remember him talking about Coppermine having more exotic (complex) caching algorithms vs. much more conservative algorithms on the Athlon with respect to how information is moved around and pre-fetched etc. But I don't pretend to understand it all. Just for whatever it's worth...

Regards,

Epinephrine



To: Cirruslvr who wrote (120399)7/17/2000 1:56:20 AM
From: Charles R  Respond to of 1571373
 
Cirruslvr,

< amd.com >

Nice link. Thanks for posting.

<Potential answers:>

I am no Scumbria but see my comments.

<1. Prior to the L2 cache going on die, the P6 was ALWAYS underfed, but is a VERY efficient design whose full performance wasn't attainable until the cache went on die.>

There is no question that 32-kb L1 cache is a bottleneck for standard Windows x86 kind of code.

<2. Athlon's HUGE L1 cache SIGNIFICANTLY reduces the need for L2 cache access, and therefore doesn't benefit as much as PIII from faster L2 cache.>

That is a given.

<3. AMD's interpretation of "no significant impact" is significantly different from mine.>

Unless there is a stated number who knows what "no significant" means.

<4. There is a bottleneck somewhere within the Athlon.>

Yes. It is the extra latency on the L1. That is by design. Nowadays architectures are tuned for MHz. I continue to be amused by the relative performance discussions of PIII and Athlon on a clock-per-clock basis. Due to the increasingly longer pipelines, the IPC for typical application code has nowhere to go but down barring some heretofore unknown breakthrough. The game is about MHz and Athlon is a winner by long shot. PIII can't keep up with notched gates, die shrinks, tight layouts, massive amount of speed path work, etc.

<5. The P6 is just THAT good.>

See above. On integer code, IPC improvements will be minimal. Athlon FPU is clearly better architected and
there should be significant delta there with equivalent platforms and compilers.

<What do you think after reading AMD's explanation?>

No new ground broken. Improved understanding of L1 miss penalty when the victim buffer is full.

Chuck



To: Cirruslvr who wrote (120399)7/17/2000 2:29:08 AM
From: Scumbria  Read Replies (2) | Respond to of 1571373
 
Cirrus,

This is cool. We actually have something technical to discuss on the thread!

I only partially buy AMD's justification for the narrow L2 datapath. The large L1 minimizing L2 accesses argument is quite valid. However, Pete Gerassi described the 11 cycle latency of the L2 as being largely due to 4 cycles of L1 linefill, and 4 cycles of L1 eviction (because of the narrow datapath.)

Eleven cycles of L2 latency is too long, and undoubtably has a significant impact on performance. The T-Bird L2 is certainly less important than the PIII L2, but a faster L2 would improve benchmark scores by several percent.

Scumbria



To: Cirruslvr who wrote (120399)7/17/2000 11:14:27 AM
From: Scumbria  Read Replies (1) | Respond to of 1571373
 
Cirrus,

There tends to be very little difference in clock for clock performance between different x86 implementations. This is because everyone is limited largely by memory latency. The memory subsystem consists of four main components:

1. L1 cache
2. L2 cache
3. DRAM
4. Virtual memory on disk

Athlon does better than PIII on #1 and #3. PIII does better on #2. #4 is independent of CPU design.

All the discussion of 6th vs. 7th generation architectures is fairly meaningless. Next year Intel will be emphasizing MHz (Willy), and AMD will be emphasizing CMP multiprocessing (Sledgehammer.)

My bets are on AMD.

Scumbria