Re: PC600 RDRAM has even worse latency than PC800
The differences in latency between PC600 and PC800 are half the latency differences seen between SDRAM modules. There are low latency PC600 die and high latency PC800 die but typically, PC800 is 45ns while PC600 is 53.3ns. Typical PC800 improves latency by 16% while increasing bandwidth by 33%. CAS2 SDRAM has 33% less latency, while PC133 has 33% higher bandwidth. The available versions of each memory type have the same bandwidth changes, but RDRAM has half the variation in latency of SDRAM.
If the main issue for P4 is latency and not bandwidth, as you state, then Intel's insistence on high latency RDRAM for P4 was stunningly, colossally, stupid - but even I give Intel management a lot more credit than that.
However, your other point was an interesting one: the 128 byte wide cacheline of P4 clears 128 bytes of the cache if even a one byte boolean value is read from memory. Athlon clears half that. So, in once sense, P4 always "prefetches" the next 127 bytes while Athlon "only" prefetches the next 63 bytes. (And PIII only fetched 32 bytes at a time - which should mean it has even less of a need for bandwidth). The next few bytes are almost always going to be used, and very often more than that. But the next 127? Fetched (and cleared from the cache) on every access?
It may have been a good decision for a CPU with a 28 stage pipeline, but it does seem to put an additional burden on bandwidth.
I think you have provided an excellent explanation for some of P4's observed thirst for bandwidth.
Regards,
Dan |