Sorry if this info has been posted. Seems to me that AMD is going to give INTEL some pretty fierce competition! Now, if they could only do some advertising!!!!! You know, snazzy commercials! <gggggggggggggg>
RichieH
By Thomas Pabst Introduction
It is not the first time that one of Intel's competitors in the x86 market seems to have a very good upcoming product. I remember already a few occasions when the announcement of a non-Intel CPU maker was making us believe that they are going to give Intel a good run for its money. However, I cannot remember a time when it was looking almost obvious, that the performance of even Intel's high end x86 products would not only be matched, but even surpassed.
AMD's announcement of their upcoming K7 CPU at the Microprocessor Forum 98 in San Jose could indeed mark a major change of the x86 CPU market in 1999. K7 will almost clearly beat Pentium II (Deschutes) performance and it looks very likely that even Intel's next high end product, code name ‘Katmai', will have a serious problem competing against K7.
This is not all though. AMD is very close to shipping the K6-2 at 400 MHz. This CPU will not just be the well-known ‘normal' K6-2 core merely running at 400 MHz, but it will use a new revised core, which offers increased performance. This way the K6-2 400 will hardly fall short against Pentium II CPUs at the same clock speed.
Then there is ‘Sharptooth', the CPU core that will be used in the upcoming ‘K6-3'. This core will be AMD's first CPU core with on-die L2 cache, as already found in Intel's Celeron 300 A and Celeron 333, code name ‘Mendocino'. 256 kB L2 cache running on a backside bus at CPU clock speed will improve the performance of the K6-2 core tremendously, making it faster than a Pentium II at the same clock speed already. The additional beauty of ‘Sharptooth' will be the fact that it will simply fit into any of the Super7 boards, making it a perfect candidate for super simple upgrades. Socket7 platforms will then be faster or at least as fast as the fastest Slot 1 platforms for the first time in history. This should be a serious reason to not leave Socket 7! Sharptooth is no vain dream or a simple promise, it's already up and running for quite a while. One reason why AMD is not shipping Sharptooth yet is that the OEMs don't want AMD launching a new CPU right before the Xmas market, and the other reason is that AMD expects that the OEMs pay a higher price for this significantly improved product, which still is subject to negotiations. Sharptooth will ship at the beginning of 1999, maybe slightly earlier, and it will ship with clock rates of up to 450 MHz. It's certainly annoying to a lot of us that we cannot have Sharptooth now, but the outlook on K6-3 should definitely keep everyone from leaving Super7 in favor of Slot 1.
You can see that AMD is coming very strongly already, offering very high performing solutions for Socket 7 platforms. The following shall explain why Socket 7 platform owners should have no reason to ever move over to Slot 1. It could also be that Slot 1 platform owners will have to face leaving Slot 1 for achieving the highest performance. The answer is ‘Slot A' and the CPU plugging into this Slot will be called ‘AMD K7'.
Dirk Meyer's presentation of the K7-features at Microprocessor Forum 1998 on October 13th was certainly the most impressive presentation in the CPU section. Dirk drowned the auditorium in a flood of high tech terms at ultra high speed, whilst showing a Poker face as if he was telling us about the weather. Hardly anybody was able to follow him. I was one of the lucky people who had the chance interviewing Dirk a couple of hours later, side by side with Dana Krelle, AMD's VP marketing. I still admit that I don't understand all of the great features of K7, but I think I've got a pretty good idea.
Why AMD's K7 will be Intel's toughest competitor ever By Thomas Pabst
The CPU Bus As already pretty well known, K7 and thus Slot A is not using Intel's P6 GTL+ bus protocol, but Digital's Alpha bus protocol ‘EV6'. EV6 has got a lot of architectural advantages over GTL+ already, like e.g. the ‘point-to-point topology' for multi-processing, but in case of the K7 it's even running at 200 MHz. This means that it looks as if K7 will be the first CPU that can really take advantage of the high bandwidth memory types like direct RDRAM and DDR SDRAM. Intel's GTL+ running at 100 MHz has a peak bandwidth of only 800 MB/s, at 133 MHz it will have only 1066 MB/s, so that you wonder why Intel's next chipset for Katmai will have direct RDRAM support. Direct RDRAM as well as DDR SDRAM running at 100 MHz offers a peak bandwidth of 1.6 GB/s and this bandwidth is only met by K7's 200 MHz EV6 bus. I guess that AMD will have to thank Intel for pushing direct RDRAM, because K7 seems to be the first CPU that will really need it. Once again in short: K7's EV6 offers excellent multi processor support, the highest bus bandwidth and is over all superior to GTL+. L1 Cache K7 will have no less than 128 kB L1 cache, 64 kB data and 64 kB instruction cache. Pentium II is currently equppied with a quarter of that and it's rumored that Katmai may have at least 2x32 kB and thus half the L2 cache sizeof K7. The large L1 cache is one of the requirements for very high CPU clock speeds, and K7 was specially designed to reach those very high clock speeds. L2 Cache K7 will come with a backside L2 cache as known from Intel's P6-architecture. AMD will be pretty flexible with this L2 cache. The K7 CPU has an internal tag RAM large enough for 512 kB L2 cache, but AMD is also planning K7-versions with no less than 2 MB up to 8 MB, using an additional external tag RAM as Intel does in case of the P6 CPUs. The L2-cache speed will range from 1/3 to full CPU speed and it's planned to use ‘normal' as well as double data rate (DDR) SRAMs for this L2 cache. The flexible L2-cache design will enable AMD to do the same as what Intel does. There will be main stream, workstation and server versions of K7, determined by the L2 cache size and speed. The K7 will have an address space of 64 GB as Intel's Deschutes core, and Slot A will be limited to 4GB addressable space as in case of Slot 1. The cacheable limit of K7 will also be the full address space of 64 GB. Clock Speeds Dirk Meyer, the chief engineer of AMD's K7, is an ex-Alpha guy. Thus it shouldn't surprise any of us that K7 was designed with very high clock speeds in mind. K7 is already now running at 500 MHz. By the time of the launch of K7 in 1H99 we should expect clock speeds way beyond that. K7 has very deep buffers to enable those high clock speeds, offering up to 72 x86 instructions in flight. The Floating Point Unit Haven't we been taught by Intel how important the FPU is all those years? Well, it's looking pretty obvious that K7 will smoke Intel's P6 FPU. K7 offers no less than 3 (three!) out-of-order, fully parallel FPU pipelines. The good old disadvantage of the non-Intel CPUs in terms of FPU-performance will be history with K7. The upcoming seventh generation AMD processor will run CAD or rendering software faster than the Intel CPUs. That is almost a revolution. The K7 Integer Micro-Architecture I guess that a discussion of AMD's new features in K7 would lead to far for most of you, but let me still name a few. Three parallel x86 instruction decoders that translate the x86 instructions in fixed length ‘Macro-Ops' feed the K7 72-entry instruction control unit. Each of those ‘Macro-Ops' can consist of one or two operations. There are two different decoding pipelines that do this job, the ‘direct path' decoding common instructions very quickly and the ‘vector path', looking up complex x86 instructions in the ‘Macro Code ROM' or ‘MROM'. The instruction control unit issues the Macro-Ops to either the Integer Scheduler or the FPU/Multimedia Unit. The integer scheduler can hold up to 15 Macro-Op-entries, representing up to 30 operations at a time. Its job is to distribute up to three independent operations to the three parallel integer execution units, each of them accompanied by a address generation unit. The address generation units are responsible for making load/store operations most sufficient, by optimizing the utilization of the L1 data and the L2 cache as well as main memory reads/writes. |