Rob, re: Alpha 21364
<Your comment about "remote" being much longer.. I don't see what you mean. There are only 4 processors in a 4 processor box and the memory access remote (i.e. the memory hanging off another processor) is 100 ns. If you look at the 21364 it acts as a network router, are you overlooking that?>
In a 4-way system, a remote access takes two or four hops round-trip. There's no way that those hops can take place within 100 nsec. I think that 100 nsec figure is actually the worst-case latency for "regular" accesses where an RDRAM bank has to precharge. Delays due to precharging doesn't happen often in RDRAM because of the massive amount of banks, but it does happen.
<Absolutely.. and to quote John McCalpin (author of STREAM, THE Memory Bandwidth metric) "it's the bandwidth stupid!">
The only thing about memory bandwidth is that its impact is lowered by the presence of large L2 caches.
By the way, one big weakness of a 4-way 21364 system is going to be memory capacity. There are only 16 RDRAM channels in such a system, meaning that with 128 MBit DRDRAM technology, you can only get a half gig per channel, or 8 gigabytes total. That limit will double as 256 MBit DRDRAMs become more common, but that's still not big enough for a 64-bit memory bandwidth monster. Compaq may have to resort to branch channels, which increase latency somewhat, or they may have to use SDRAM-to-RDRAM converters, which also impact performance.
Tenchusatsu |