SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Politics : Formerly About Advanced Micro Devices -- Ignore unavailable to you. Want to Upgrade?


To: Dan3 who wrote (83233)12/16/1999 11:02:00 AM
From: Ali Chen  Read Replies (2) | Respond to of 1571084
 
Dan, <If I am mistaken, I'd be interested in hearing why.>

You are not mistaken, Dan. There is no need to
argue with that arrogant but fairly ignorant
youth. He is doing what he told to, without
any own afterthought.

I was trying on several occasion to tell him
that the pure bandwidth is not the bottleneck
in most applications. The "streaming large blocks
of data" is the dying Intel mantra to justify
their attempt for Unified Memory Architecture
with AGP as a primary "streaming device" to
pump texture maps only (that need no processing),
and with "Intel inside CPU" as a primary graphics
processor.

We all know that the demand for 3D has shifted
this paradigm from UMA to hardware accelerators
with continuously increasing LOCAL memory
(since the only local memory can provide the
necessary bandwidth for modern video processing).

The little relevance of raw bandwidth was proven
in practice during several transitions in system
memory technology, from FP to EDO, then from EDO
to SDRAM, and now we are witnessing another one
- from SDRAM to RDRAM. In all cases the raw
bandwidth was increased by no less than a factor
of two, but the resulting system performance gain
was in the range of 3-10% only.

Even more, even now a x86 CPU has no internal
means to fully utilize even the existent memory
bandwidth of regular 64-bit-wide SDRAM.
Why this is important? Because if the "streamed
data" would have no need to be processed by
a CPU, the whole server business could be
done by a simple hardware switch-multiplexor!
No need for Itanic or Athlon-Sledgehammer!
As we all know, this is not the case whatsoever,
and at least a CRC needs to be calculated and
checked on every data packet, which require
a full-blown CPU intervention, not talking
about more intelligent stuff like routing
or content processing. Maybe Intel has some
ideas how to separate raw "streaming" from
intelligent content processing using some
hardware means, but I am not sure if it
bodes well with current software layers
and tendencies.

Therefore, all his pomposity and claims
about holly chipset designability is a BS.

Regards,
- Ali



To: Dan3 who wrote (83233)12/16/1999 1:19:00 PM
From: Tenchusatsu  Read Replies (1) | Respond to of 1571084
 
Dan, <But their demands on memory aren't for streaming huge blocks of memory, they are demands for many smaller bursts from random locations.>

That's where huge processor caches comes in, at least for Xeon servers, and perhaps for Itanium as well (4 MB of off-chip L3 cache in the Merced module). As for EV7, well, that's why they integrated memory controllers right onto the processor core. That's a sure-fire way to reduce the latency of main memory accesses.

Especially in servers, the biggest cause of latency is unsustainable throughput. All this nitpicking over the additional latency of RDRAM might mean a few percentage points of performance in desktop systems, but it means absolutely nothing in servers.

<But I'm arguing that for almost any server application, it is DDR that has better MHZ to MHZ performance due to lower latency.>

No, in fact the performance differences between DDR and RDRAM in servers is inconclusive. It's not clear to me that DDR can use its bandwidth efficiently enough in a server environment to match the potential performance of RDRAM. Besides, the main reason DDR is being pushed over RDRAM is not performance, but cost. That's another debate, however.

But like I said before, the miniscule savings in latency that you get with DDR over RDRAM mean absolutely nothing in servers. Don't take my word for it, though. Take what MPR says about HotRail's upcoming 8-way Athlon chipset:

Perhaps the most significant problem with the HotRail architecture is the extra latency added by the relatively long path each transaction must take through the chipset. ... [However,] the company points out -- correctly, we believe -- that its advantage in sustained throughput for the whole system is much more important for most server applications.

So in short, servers care more about bandwidth than latency. If you can't sustain the bandwidth, then a miniscule latency advantage of DDR over RDRAM isn't going to mean squat. This is different from desktops, where sustained bandwidth is less important, meaning that latency becomes a bigger factor in performance.

Back to the original subject regarding Alpha EV7. Yes, 16 RDRAM channels, four per processor, does seem like an insane amount of memory bandwidth. But four RDRAM channels are more easily integrated onto the processor core compared to four DDR channels. And that integration will naturally lead to lower latency. Therefore, EV7-based servers will have the advantages of high bandwidth, low latency, and very sustainable throughput. (In fact, I feel EV7 can seriously challenge Merced/Itanium in terms of performance.)

Of course, EV7-based servers with RDRAM will naturally cost more than servers based on an equivalent amount of DDR SDRAM. I guess that's the price paid for the performance. If they decide to switch to four integrated DDR controllers, I'd sure like to know, since there are some major trade-offs to consider here.

Tenchusatsu