Jeff R, thanx 4 an interesting post, but I disagree with a few things. With a dedicated backside cache the access latencies are lower than the Socket 7 style fronside bus plus the L2 cache scales with processor frequency thus giving it a big advantage.
Obviously, this will disappear with the K6-3 with a 256K on-chip cache and no external bus at all for 90-95% of memory reads. It will also disappear with the K7, which has a much superior BSB.
It's remarkable in my mind that the K6 can get even close to the Pentium II as far as clock frequency given Intel's advantage in deeper pipelining and so called inherent process advantage.
You never answer why AMD is able to match Pentium II performance up to 350 MHz with a 100 MHz L2 cache compared to a Pentium II with 175 MHz L2 cache. The answer is the AMD has more parellelism built into the K6 core, especially for integer and logical operations, than the Pentium II. I find that the SiSoft Sandra benchmark on my K6-2-300 gives results 95% of the BX chipset Pentium II-400.
Actually, AMD has a more complex process than Intel in that they have 5 levels of metalization with local interconnect plus shallow trench isolated transistors with C4 bonding. I think Intel is using 4 level of metalization and is still using wire bond technology. Intel's advantage is that their manufacturing process is probably less complex and therefore yields higher and for the sake of having a considerable die size disadvantage.
This is a cost of production issue. While I don't know exact yield figures, they are certainly above 50%, and at any given time, centered on the "next to highest" speed grade. Intel's yields are certainly below 80%, no matter what Paul Engel may tell you. The fact is, Intel will have to learn the complex AMD process steps sooner or later, and I wouldn't expect high yields on their first attempts.
I'm wondering whether one of the K6-3 (also K6-2-380 and K6-2-400) improvements is to implement "write combining." Based on a comparison of Winstone results from K6-2's (300 to 350) running various speed L2 cache clocks (66 MHz to 112 MHz) and various speed main memory access (66 MHz to 112 MHz) I developed a model to predict the performance of the K6-3. Even with an assumption that a 256K L2 cache will experience 41% more "misses" than a 512K L2, the model predicts that a K6-3-400 will outperform a Pentium II-500 with 100 MHz bus, but be equivalent to a Pentium II-504 running a 112 MHz bus. (BTW, this was assuming 10% core speed improvement.)
Perhaps AMD is having trouble getting the memory transistors to run at 400 MHz, but wants to take advantage of the core improvements. That would be consistent with why they are not including on-die L2 for the K7, since if 400 MHz is difficult, 550 MHz is probably impossible.
Petz |