Dear Pravin:
Since most of the CPU to Memory bandwidth is used by L2 Cache up and downloads, most FSB related transfers are 1 address and control cycle, 12 latency cycles and 8 data cycles. Synchronization difficulties would add between 0 and 2 cycles. Probably would be around 2.5% to 7.5% on memory intensive benchmarks. It would be about 2% to 4% for action type 3D games. For PIII the bus would be synchronous due to 133Mhz FSB being exactly 1/2 of 266Mhz DDR.
One additional advantage might come in as CAS might be able to be specified with a resolution of 1/2 cycles. Thus a memory can have a CAS of 2.5, if the memory is slightly faster than standard CAS of 3. And super quick memory might have a CAS of 1.5 instead of the standard high performance CAS of 2. This might be allowed in the standard or as a superset. This could increase speeds by another 5% to 10% as memory would come in four access speed flavors rather than the current two. For Athlons, this could be a midlife kicker. Also, in a 2 CPU SMP setup, each CPU FSB could be skewed 1/2 a clock (3.75nsec at 133Mhz) from the other resulting in an efficient interleaving to memory. This could result in a 1% to 2% gain.
Also, Athlon's 16 way L2 is more focused towards being a server or in large applications like DB, like Oracle or Informix, and OLTP, online transaction processing, than the 8 way L2 of PIII or Xeon. It is less likely to show an increase in small tightly coded routines such as benchmarks, simulation, or games. This is because a high associative set number only gets used when the code complexity goes up. Thus, I think that the initial benchmarks are geared to low associative L2 caches due to the manufacturers desire that the current CPUs look good.
Pete |