"In Q1 & Q2 Athlon will be unbeatable performance wise!"
A Preview of The Fastest Pc Processors In The Year 2000.
aceshardware.com
By Johan De Gelas
Part 1, First Half of 2000
Will the Athlon remain the fastest PC processor throughout 2000 or will Intel's Willamette be the first choice for computation-intensive tasks? Today we hope to enlighten you with some hot new info about the cutting-edge CPU's of next year. We will take a look at how AMD will improve the Athlon in the first two quarters of next year: the Spitfire (alias the Athlon select), the Thunderbird (Athlon Ultra)... we will discuss them one by one to see what they have to offer us next year. In the next part, we will discuss what we can expect from Intel's Willamette and the AMD Athlon Mustang.
Before we start: what follows is based on reliable sources close to AMD and Intel, but of course we had to fill in some gaps with small potions of (educated) speculation.
The new .18æ CPUs
A deep 10-stage pipeline (more technical background here) and long latencies make the Athlon a real clock speed king. Overclockers have routinely been able to overclock the 650 MHz .25 æ Athlon up to 800 MHz and more while the PIII-600 based on Intel's slightly superior .25 æ process can hardly make it up to 660 Mhz. The problem with the 12-stage pipelined PIII is the unbalanced latencies: some instructions have low latencies, while the Athlon instruction latencies are all carefully balanced to obtain the highest clockspeeds.
Now that both Intel and AMD CPU are built with the .18 æ process, the Athlon is again showing its superior ability to ramp the clockspeed up. While Intel's Coppermine will have a hard time reaching even 900 MHz, .18 æ Athlons are already running at 900 MHz. Overclockers report that the .18æ Coppermine 733 can be overclocked to 800 Mhz, the same number an overclocked .25 æ Athlon reaches! It is clear that the P6-core is starting to show its age.
AMD needed to switch to the new .18æ process because .18æ Athlons are smaller (more cost effective) and consume much less power, not because the .25æ Athlon had no frequency headroom left. The only real problem for a 800 MHz .25æ Athlon is massive power consumption. Therefore, the new Athlon 750 MHz will based on the new .18æ process, and will most likely be announced early next month.
This 750 MHz processor will make the Athlon once again the indisputable performance king. The PIII EB 800 MHz will appear in the first quarter of 2000, but we expect AMD to deliver at least 800, and 850 MHz (or 866 MHz) speed grades in the same quarter. It also possible that a 900 MHz Athlon will enter the market around March, as AMD has already demonstrated 900 MHz Athlons with air cooling, including a copperwired Athlon from Dresden at the webcast a few days ago.
Copper Power
In the second quarter, AMD has planned out an extremely aggressive CPU road map. While only a few of the 800 MHz chips will be manufactured with aluminum wires, I expect that Dresden will be pumping out quite a few 900 MHz cores with copper interconnects by then. The use of copper should easily give the Athlon 100 MHz more frequency headroom, while, at the same time, lowering the power consumption of the chip.
The Coppermine (image above), which does not contain any copper, will not be able to turn the tide, as we strongly doubt it has much frequency headroom left above 850 Mhz, even with a tweaked .18 æ process. Using a smaller process buys you frequency headroom, but once the limits of the architecture are reached, using smaller processes will result in diminishing returns. Intel first 1 GHz CPU will be the Willamette, not the PIII.
Integrated cache has pitfalls!
Around June of 2000, AMD will release the Thunderbird, and the Spitfire. These two Athlons have both an on die, full speed L2 cache.
To better understand AMD's plans let's examine previous moves. Remember the K6-III problems? The K6-III was scheduled to be launched around January '99, but the chip appeared two months late, and at this very moment still hasn't reached more than 450 MHz while its brother the K6-2 will reach 533 Mhz (96 Mhz * 5.5, like the K6-2 433) this quarter. What's the reason behind this? Yields.
The K6-III does not yield well at higher clockspeeds. A lot of the on-die L2-caches are not entirely correct, and a quite a few K6-IIIs are sold with the L2-cache disabled (as K6-2s). In other words, the K6-III is not exactly a huge success for AMD because OEMs are not interested due to the higher price and lower clock speeds. It is hard to explain to most customers that a K6-III 450 is faster than a K6-2 500. You can imagine that if the same thing happens to the Athlons with on-die cache, it would be a catastrophe for AMD.
There is another problem with on-die L2-cache, the Athlon has already 128 KB L1-cache. Basic cache design rules say that a L2-cache should always be at least 4 times bigger than the L1. Every CPU on the market right now respect that rule: the K6-3's (4x), the Celeron (4x), the Athlon (4x)...
So will the thunderbird have 512 KB L2-cache? That would add about 50 mmý and would make the Athlon Thunderbird 152 mmý big. That is not unreasonable (the current .25 æ athlon is 184 mmý) but would mean that AMD would face even worse problems : if producing an entirely correct 256 KB of on-die cache is difficult, you can understand that the yields on 512 Kb part would be disastrous.
We can solve all those problems by introducing two solutions:
1) the exclusive cache
There are strong indications that the Thunderbird and the Spitfire will use exclusive L2-cache like VIA's joshua CPU.
How does an exclusive cache works?
Let us first see what happens in a 'normal' cache: When a cache miss occurs in the L1-cache, the data or instructions that could not be found must be fetched from the memory or the L2-cache. The cacheline in the L1-cache that has not been used recently will be (LRU) replaced with the newly fetched data.
With an exclusive L2-cache, the original L1 cacheline (the one that will be replaced) will first be written to L2-cache. When that has happened, will the newly fetched line only overwrites a cacheline in the L1. The L2-cache works like a 'overflow bucket' for the L1-cache.
An exclusive L2-cache has a lot of advantages: a 128 KB 2-way associative L1-cache (more info about the Athlon caches here) combined with a 256 KB 4-way associative L2-cache will behave like one big 384 KB 6-way associative cache. The hitrate will increase significantly.
2) Disable the part of the L2-cache that did not yield well. With modern manufacturing techniques it is possible that if you got a CPU with only half of the integrated L2-cache yielding correctly, you got the possibility to disable the defective part so that you are left with a CPU that has a correct working, but two times smaller, L2-cache.
Demystifying the Spitfire and Thunderbird
Now combine these two solutions with AMD's marketing strategy: the Thunderbird is aimed at both the Server and the Workstation market. Remember that AMD used to call its the Server version 'AMD Ultra' and the workstation version 'AMD Professional'.
AMD will produce (very) small amounts of Athlons with 512 KB L2-cache (speculation) and quite a few cores with 256 KB L2-cache (confirmed by several sources). The successful 512 KB cores will be sold in the server market for huge amounts of money, the 512 KB cores of which the L2-cache does not yield entirely correct will, will be sold as 256 KB L2-cache parts, targeted at the high end desktop market and the workstation market.
While it isn't sure that there will be Thunderbirds with 512KB L2-cache on die, our sources confirm firmly that 256 KB parts will appear. The Thunderbird cores of which the 256 KB L2-cache does not yield well will be recuperated. If the 128 KB L2-cache is correct, the other part will be disabled and will those CPU's will be sold as 'Spitfires'. The Spitfire, alias the Athlon select is a socketed version of the Athlon, which will compete with the Coppermine 128 in the budget market.
I am pretty sure that you understand now why Intel will introduce Coppermines with only 128 Kb L2-cache in the budget market. Indeed, those Celeron Coppermines will allow Intel to recuperate some of the bad yielded Coppermines (with 256 KB L2-cache) to compete in the budget market.
It is a public secret that the PIII EB (coppermine) yields are pretty bad: it is almost impossible to get a 733 MHz part right now, quite astonishing if you consider the incredible amount of manufacturing capacity Intel has.
Ok, Now that we have gathered all this information, let us get a general overview of quarter 2 2000:
Budget/Medium PC Maximum Clockspeed (Q2 2000) L1-cache L2-cache Chipset-CPU bus Memory SMP AMD K6-2+ 600 MHz 64 KB 128 KB 133 MHz ? 100 MHz SDRAM no AMD K6-III+ 600 Mhz 64 KB 256 KB 133 MHz ? 100 Mhz SDRAM no AMD Spitfire 800 MHz 128 KB 128 KB 200 MHz* 100 MHz SDRAM 2-way Intel Coppermine 128 700 Mhz 32 KB 128 Kb 133 MHz 100 MHz SDRAM not official Workstation/ High-end Desktop AMD Thunderbird 1 GHz 128 KB 256 KB 266 MHz* 133 MHz DDR ** 2-way AMD Athlon 900/866 MHz 128 KB 512 KB (1/2 speed) 200/266* MHz 100/133 ? SDRAM 2-way Intel PIII EB 866 Mhz 32 KB 256 KB 133 MHz RDR PC800 2-way Server Intel PIII Xeon 866 MHz 32 KB 512-2048 KB 133 MHz RDR PC800 8-way AMD AThlon Thunderbird ? 1 GHz 128 KB 512 KB ? 266 MHz* EV6-bus 133 MHz DDR 2-way
*266 MHz/200 MHz chipset bus = 2 (Double Data Rate ) x 133/100 MHz
The Athlon 'Spitfire' is the interesting chip here. It will be plugged in 462 pin Socket-A and will stay behind the other Athlons in clockspeed, because it competes with the Intel low cost offerings. The 'Spitfire' should be a very attractive chip for gamers: easy to overclock (sold at lower clockspeeds), relatively cheap and 256 KB on-die cache is perfect for gaming purposes.
This chip will have die size of about 114 mmý and this together with the low cost socket A motherboards should enable it to compete with the Coppermine 128 (+/- 92 mmý). If the Spitfire will be introduced in a timely manner, the K6-2+ and K6-3+ will probably never reach more than 600 MHz. The short pipeline of the K6 makes it hard, even with an advanced .18 æ copper process, to reach such high clockspeeds. It is clear that AMD wants to phase out the K6-line as soon as the 'Spitfire' arrives, as the Socket A Athlon will have no problem competing with the Coppermine 128. The .18 æ K6-line, thanks to very small die size (< 70 mmý) and low power consumption, will continue to exist as a mobile processor.
The 'Thunderbird' will be the choice of powerusers: it will be available at higher clockspeeds . Both a slot A and a socket A version will marketed, what makes me suspect that even the slot A version will not have any third level SRAM cache. As we know that the next Athlon ('Mustang') will have 2 MB on-die cache, it is clear that both AMD and Intel are preferring on-die cache over the SRAMs that we now find in the Athlon and PIII: most applications run fine in 256KB to 384KB cache, and SRAMS latencies are up to 3-5 times higher than on-die cache (see below).
Considering AMD's and its partners (VIA, ALI, SIS) limited experience with multiprocessor chipsets, only two-way Athlon systems will be available in quarter 2. Four to Eight way systems, based on AMD's own chipsets might be available in quarter four.
Is the Athlon Cache hungry ?
About a year ago, we proved that the k6-core gains more from a fast L2-cache then the P6-core. So what about the K7-core ? Will it benefit a lot from a on-die L2-cache ?
It is true, when a CPU gets parallel access to two fast caches instead of one fast and one slow cache, performance will always increase. As the Athlon has 128 KB L1-cache, and huge buffers masking memory latencies, you could think that the L2-cache less important. Nevertheless, our benchmarks show that the Athlon loses quite a lot of performance, if you disable the L2-cache. So is the Athlon cache hungry or not ?
Well let us see how fast the CPU's out there get access to their data. To get an idea how the different cache configurations compare, take a look at the table below.
Features: PIII EB PIII Celeron A Athlon K6-III L1-cache 32 KB 32 KB 32 KB 128 KB 64 KB L2-cache 256 KB 512 KB 128 KB 512 KB 256 KB L1-cache latency 3 cycles 3 cycles 3 cycles 3 cycles 2 cycles L2-cache latency 4 cycles 24 cycles 8 cycles 21 cycles 11 cycles Total L2-latency 7 cycles 27 cycles 11 cycles 24 cycles 13 cycles Datapath to L2 256 64 64 ? 64 256 Set-associativity L1 4 4 4 2 2 Set-associativity L2 8 4 8 2 4
It is clear that the PIII EB has by far the most advanced L2-cache. A L1-miss costs the Athlon an extra of 21 clock cycles, while the PIII EB waits only 4 cycles longer. Even the large 72-entry buffer of the Athlon can not totally hide 21 clockcycles, especially if you consider that the Athlon (9 execution units) can execute instructions about 50-70% faster than the K6 (6 execution units) and the PIII (5 executions units). No, the Athlon will benefit a lot from on-die cache, even if the new L2-cache does not have such low latencies as the PIII EB L2-cache has. AMD's engineers will probably prefer the ability to reach high clockspeeds over a low latency L2-cache.
Keep that in mind that caches are extremely important for modern CPU's: while a P133 ran at a clockspeed two times higher than the memory, a modern CPU like the Athlon or PIII EB runs 6 to 7 times faster than the memory !
Conclusion
The Athlon allows AMD to diversify its product line in a way AMD has never been able before. It is clear than an Athlon with on-die L2-cache and running with faster memory types will be unbeatable performance wise. Only two things can hamper the Athlon ability to rule the CPU market of the first half of 2000: lack of Slot A and Socket A motherboards, and ubiquitous ISSE support.
AMD' s hardware is ready for battle, but now it must get its foot in the software developers' and motherboard manufacturers' doors. But what about the Willamette ? Well, it won't appear before quarter three. So don't forget to check us out soon, as we will post part two which will discuss quarter three and four.
Discuss this article here!
Technical bits here!
Special thanks to Andreas Kaiser and Idiot. |