SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Intel Corporation (INTC) -- Ignore unavailable to you. Want to Upgrade?


To: chic_hearne who wrote (109117)8/31/2000 3:13:00 PM
From: Tenchusatsu  Read Replies (2) | Respond to of 186894
 
Rob: And tell the less informed about the detail inherent in your 45 Gigabytes/sec number.
Chic: Not sure what you're getting at.

He's asking for a little context behind that very impressive-sounding 45 GB/sec figure. I guess he's not very familiar with the POWER4 architecture.

On the one hand, that 45 GB/sec figure is pure marketing. It's just the summation of all the interchip connections in the quad-CPU module. Now if the workload was balanced absolutely perfectly, and every CPU were accessing data spread evenly across every L2 cache, we might see total system bandwidth within the module approaching the 45 GB/sec mark. But in real-world conditions, the actual system bandwidth may be much lower due to imbalanced bursts, unevenly distributed data, and other bottlenecks.

But on the other hand, there's no denying that one of POWER4's strengths will be the interchip bandwidth. Indeed, all L2 caches in the POWER4 module will be accessible by all eight CPU cores with minimal latency. This can have a tremendous impact on the server benchmarks and other tasks involving shared memory and transaction-processing.

In contrast, the Alpha 21364 will feature four RDRAM controllers integrated on the CPU die. Then (at least in concept), sixteen of these bad boys can be arranged in an interconnect mesh. Each CPU will have its own quad-RDRAM channel, and provided most of that CPU's data is localized, this leads to very low latency memory accesses. For non-local data, the CPU will have to send an access through the interconnect mesh to the specific CPU which owns the non-local data.

In short, Alpha 21364 takes your typical NUMA model of multiprocessing to the next level. This is very different from the traditional UMA model of IBM's POWER4, which is why trying to compare the two in terms of bandwidth is as useful as comparing apples to oranges, or trains to trucks, etc.

Tenchusatsu