Dear Ali:
Re: "1. In the PC business, performance sells, and will be. Either MHz-based, which is easy to communicate to buying public, or true performance, which must have a clear undisputed lead."
Well those that look at true performance seem to agree that AMD has that lead. Hammer will just make it obvious to all.
Re: "2. As core frequencies continue to rise, the gap between memory and processor will continue to widen, and off-chip traffic will eventually dominate on current platforms."
This is part of P4's problem. They use bandwidth to attempt to reduce latency. More and more bandwidth is wasted in this pursuit. Thus, the efficiency of the available bandwidth is less with P4, not greater. To make up for this, P4 requires a larger cache among other things. Thus, Intel is going down the wrong path even worse than AMD by your criteria. Hammer has actually more off chip comm power than either P3, P4 or IA-64. And it will be even more efficient of this bandwidth than Athlon.
Re: "3. Effectiveness of x86 instruction set architecture (ISA) seems to reach its limit - no matter what the implementation is, inner performance is about the same."
Yet, Hammer is better than AXP which is better than P3 which is better than P4. IA-64 is not doing as well as P4. RISC has an advantage of performance at a cost of bandwidth. CISC like x86 are better at functionality vs code size. Interpreted languages like BASIC, Forth, Smalltalk and Perl have even a higher functionality vs code size ratio. The IPC of these interpreted high level languages are much higher than x86. So if code and data bandwidth is the issue, going EPIC is not the way. Remember the Z8? It could run BASIC directly (limited yes, but it works).
Heck a Z80 running BASIC uses only 64KB of memory total. You could put that on a single die and its performance may outrun the systems of today and it needs far less bandwidth to do the same job. And its die size would be in the tens of mm2 and it would need only a small ethernet or memory link.
Re: "4. To get more performance, the off-chip traffic must be reduced, which means better caches, which means bigger on-chip caches."
See above for something that would reduce off chip traffic and use smaller caches. You need to think out of the box.
Re: "5. Bigger cache requires bigger die. Therefore, strategically, the 300mm fabbing and big-die big-cache chips will have increasing advantage, and will be more and more economical as die shrinks."
Only if you are at the same process and you do not use other techniques to reduce this need. How about putting the caches on the memory dies instead? Micron showed that you could put those caches on the NB instead. AMD's Hammer uses a communications net to share the caches of all CPU dies in a system. The fact that it also boosts overall memory size and bandwidth and allows I/O to be used from anywhere within the net is a bonus.
Re: "6. Smaller-die theory will break: the die cannot be made smaller than certain size, I guess about 80-100mm2, because of pad/bump limitations, and current density/power dissipation limits. The smaller die will not scale down as well as bigger die."
Yet, both Intel and AMD have shown that chip pin density limits seem to recede over time. The current pins/die are are much higher than just a few years ago. Later, they may use current process technologies to link many future process dies together on one substrate. IBM uses such MCMs. When copper based communication hits a wall, there is always optical communication. You can stuff 70K lasers & photo diodes on such a die. That is much higher than your few hundred limit. And each allows for about 10^12 bits a second without wavelength mixing per fiber.
Granted, currently the expense is rather high for it and copper serves well enough now, but in the future that may change.
Re: "7. That's it. I don't know what AMD is thinking, but the published tactics of "smaller die" is poised to fail again, IMHO."
Well lets take a different tack. If your big die theory is so good. Would it justify a die twice as large as AMDs? 3 times? Now you know that Hammers will connect between each other without any of that glue logic required of P4s. At 2 times the size, would 2 small die CPUs beat 1 large die CPU? How often would the reverse be true? On most of today's code used in servers and such, 2 CPUs would beat 1 CPU at least 90% of the time. Especially in the standard multitasking environments typically found today erven on users desks.
I submit that two 100mm2 Hammers would outrun a 200mm2 big die P4 Xeon manufactured at about the same time at the same process generation. It is likely that 1 100mm2 Hammer would outrun the P4 more than 50% of the time. Which throws your big die theory into the same heap of failures.
Pete |