To: Tony Viola who wrote (127734 ) 2/18/2001 12:48:29 PM From: Dan3 Read Replies (1) | Respond to of 186894 Re: Foster DP is on roadmaps to come out at 1.7 and 2.0 GHz, on 0.18. If they do, any AMD server effort is trumped before the hand is even dealt. The P4 (Willamette/Foster) architecture doesn't appear to be a very appropriate one for server and SMP use. This is no surprise, considering that at the time P4 was being architected, Itanium was expected to be dominating the server and SMP workstation markets before P4 was released. P4 was expected to follow Itanium's release by more than a year and not be used in servers or SMP workstations. P4 has 128 byte cache lines and no victim cache. So, if Foster comes out with a cache 4 times the size of the present P4, it will be able to cache 4K + 64 locations. The existing 256K L2 Thunderbird chip, due to its 64 byte cacheline and victim cache, can cache 3K locations. There will often be overlap between the second half of the P4 cacheline and two Thunderbird cachlines, but there will also often be cases when P4 is busy doing an unneeded read (leaving its pipeline stalled) while Thunderbird is reading needed bytes from a different memory page. P4's L2 and trace cache are limited to "8-way" cacheing. The 8K 4-way L1 adds 4 more LSB locations for instructions. Since most threads start off at the same Least Significant Bit (LSB) address (0), and thread page LSBs will usually be the same as memory page LSBs, P4's architecture can start thrashing its cache with as few as 6 threads (each thread using one LSB location for data and one for instructions). Thuderbird's Athlon architecture gives it 2, 2 way L1s (instruction and data) and a 16 way victim L2 - so Thunderbird can cache 20 LSB cache locations letting its cache support 67% more threads. Thunderbird starts out with a larger L1. P4's L1 trace cache is more efficient than its 256K L2 and 8K L1 data cache, but has been described as equivalent to a 12K instruction cache. Thunderbird's cache architecture has two big advantages for server use compared to P4. Fisrt is its ability to cache twice as many total locations per K of cache, and the additive characteristic of the victim cache (unlike the P4, Thunderbird doesn't duplicate the L1 in the L2). Second is the greatly increased "wayness" of Thunderbird's cache - making it a much better processor for systems like servers or SMP workstations likely to see many concurrent threads. Between these two factors (more efficient cacheline size and the capability to store 20 instead of 12 LSB locations) a standard 256K L2 cache Thunderbird will have about the same cache performance characteristics as a 1meg L2 Foster. Even with 2 or 4 meg of L2 cache, Foster will be suffering from its lack of "wayness" when running the number of threads typically seen on a server. Now consider, on top of all this, the extra load P4 puts on the cache due to its pre-fetching... If AMD ever actually ships an SMP Athlon of some sort, Foster will have a very tough time competing with it in server applications. Dan