SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD)
AMD 214.96+5.5%Nov 24 3:59 PM EST

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: Joe NYC who wrote (27460)2/5/2001 3:18:10 PM
From: Dan3Read Replies (2) of 275872
 
Some thoughts expressed on ACE's tech board:

There have been two changes in the computing universe since P4 was designed that may have undermined some of the work of P4's designers; RDRAM didn't ramp as well or soon as expected, while CPU clock speeds ramped more quickly than expected. Rambus was supposed to be shipping at commodity prices for PC800 a year ago, and to be shipping at well over a GHZ by now. At the same time, P4 was expected to be around 1.2GHZ, and PIII/Athlon around 800MHZ. So P4 was designed with the expectation that the bandwidth of its memory bus would be keeping up with P4's clock speed. It has a very fast, very small, 8K 4-way L1 data cache, a 12K uops instruction cache, and a large, fast, moderately wide and deep 8-way 256K unified L2. P4 uses prefetch heavily, making heavy use of the large memory bandwidth it expects to be available, and cycles its caches regularly - taking advantage of available memory bandwidth to prefetch several times as many bytes as are actually used (but making available some bytes much sooner than would otherwise be the case).

In contrast, Athlon has two large caches - a very deep 128K 2-way L1 harvard style cache with 64K instruction and data segments, and a wide 16-way 256K unified L2 victim cache. Large code snippets can cycle throught the L1 without ever clearing any of the L2's pages, and between the two caches, up to 18 memory ranges with the same LSB can be cached. The Athlon is designed to go to main memory as rarely as possible, and to be (relatively) frugal with whatever bandwidth is available. This is in keeping with what appear to have been more pessimistic assumptions on the part of Athlon's designers regarding memory being able to scale with CPU speeds. This architecture would also be very beneficial if AMD ever managed to ship an SMP system - there will be much less contention for access to main memory.

While Athlon shows relatively little benefit moving from PC100, to PC133, to DDR200, to DDR266, the implication is that a 3GHZ Athlon with DDR266 won't be too badly crippled in IPC compared to a 1GHZ Athlon on PC100. And DDR266 is already starting to ship. Memory bandwidth won't be limiting the Athlon architecture.

P4 "takes advantage" of the platform provided for it and shows a benefit moving from PC600 to PC800 RDRAM. There have also been pretty strong indications from Rambus inc and Intel that P4 without dual channel RDRAM doesn't do nearly as well. The implication here is that even at 1.5GHZ, providing performance on mainstream applications roughly equal to a 1.2GHZ PIII or Athlon, P4 is close to maxing out its performance on any near term available memory subsystem. RDRAM channels are as expensive to provide as SDRAM channels due to their need for ground traces and strict layout requirements. So despite the nominal reduction in data path width, the actual width of an RDRAM channel on a motherboard (and the cost of providing it) is about the same as the cost and width of an SDRAM channel. So dual channel RDRAM is not a freebie that comes from the 16 bit data path of RDRAM, dual channel RDRAM is a costly addition to any system that will be avoided if possible.

The bottom line is that P4 performance may level off somewhere around 2GHZ - the performance of Tualatin/Athlon at 1.5GHZ, even thought the core may run at much higher speeds. P4 may already be too demanding of memory bandwidth for even a dual processor system to scale well - unless the P4 is provided with a 3rd or even 4th expensive RDRAM channel, or unless RDRAM speeds can be ramped with P4 clock speeds. RDRAM shipping at much faster speeds was once expected but now looks unlikely.

IMHO,P4's designers were let down by Rambus, and it's hard to see how they will be able to get around that problem.
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext