SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Intel Corporation (INTC) -- Ignore unavailable to you. Want to Upgrade?


To: Rob Young who wrote (135420)5/18/2001 12:40:45 PM
From: Tenchusatsu  Read Replies (1) | Respond to of 186894
 
Rob, reiterating my four concerns regarding SMT, and relating them to Alpha:

1) Validation - The added complexity of validation will surely impact the schedules. And like Itanium and McKinley, Alpha is not immune to schedule slips.

2) Software support - What I mean here is whether existing applications, including the ones that run several threads, take advantage of SMT without any modifications whatsoever. Of course the processor has to be able to run all existing apps, but if SMT requires specific software support, then all those existing apps would be running in single-threaded mode, not SMT.

3) Memory bandwidth requirements - Certainly Alpha would not have to concern themselves in this area, since EV7 provides boatloads of bandwidth. And tolerance to latency is somewhat of a don't care, since the latency of EV7 will be very low anyway. I can only assume that EV8 has a similar memory subsystem and infrastructure as EV7, right?

4) Impact on caches - The issue here is pretty complicated, more than you and I would be qualified to address. (But that's not going to stop us, now is it?) Anyway, the problem is similar to the increased memory bandwidth requirements in SMT mode. More threads running simultaneously means more demand on the L1 and L2 caches. And if the caches aren't up to the task, you'll get lousy speed-up with SMT.

For example, say the L1 cache in your CPU is weak. It may be adequate to keep the L1 miss rate to 10% for a single thread running all by itself. But when you start adding more threads to the mix, it's possible that the L1 miss rate for each thread now increases to, say, 40%. One obvious solution seems to be to beef up that L1, but then that could slow down the entire processor. There may be other solutions, but they all incur trade-offs that would give the CPU architects nightmares.

No doubt the Alpha EV8 guys are all over these issues already. Surely thread-level parallelism is the next step beyond instruction-level parallelism. But exploiting TLP using SMT is very tricky, though the rewards could be significant depending on the application, platform, and many other factors.

Tenchusatsu