SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Intel Corporation (INTC)
INTC 50.59+4.9%Feb 6 9:30 AM EST

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: Tenchusatsu who wrote (135395)5/18/2001 9:31:57 AM
From: Rob Young  Read Replies (1) of 186894
 
Tench,

<1) Validation - It's hard enough validating a single-threaded processor with all of its OOO complexity. Wouldn't adding
another thread to the execution stream increase the validation space exponentially?>

Who knows, I'm not a chip/circuit designer. Certainly couldn't make it easier though I would imagine.

"2) Software support - The spinning semaphore problem is one issue that Alpha already addressed, but are there
others? Is it really trivial to get existing apps running on SMT, or will they have to support it explicitly?"

Getting apps running? It had absolutely run existing apps (and well!) from the outset or it is doomed.
I think Intel is unique in that they can essentially (or more accurately *think* they can) get folks
to adopt a new architecture (IA64) with very poor support for existing binaries. Before you flame
me, it is essentially very public knowledge that Itanium (Merced) does a very poor job performance
wise running existing IA32 binaries.

"3) Memory bandwidth requirements - It's been suggested that SMT processors are more tolerant to latency because if
one thread is stalled behind a cache miss, the other thread(s) can still make progress. That means sustained
bandwidth becomes much more important than before, and latency becomes much less important. (This is an
interesting phenomenon, given that the Rambus saga helped to bring about the entire latency-vs-bandwidth debate.)"

At the risk of mixing things up.... EV8 as described as having 4 PCs (Program Counters). You may have a very nice multi-threaded RDB (Oracle) and yes, if one thread of execution stalls a feature
they point out is the thread goes into a wait state (instead of spinning and chewing resources), more to it than
that.... Latency will always be most important from here out. Power4 talks about 100 GByte/sec L2 bandwidth.
Everyone doing pre-fetch in one form or other to mask main memory "limitations". And how do you talk about bandwidth limitation when bandwidth is 12 GByte/sec for EV7 + EV8 not counting switched "remote"
memory bandwidth?

"
4) Impact on caches - How robust will the cache hierarchy need to be in order to handle the needs of two or more
simultaneously running threads? SMT is kind of pointless if it causes massive cache thrashing."

Ahh... here is where maybe you have the wrong architecture in mind. Keep in mind that with EV7 (and
by extension EV8) you can pull directly from remote L2. I'm not convinced but maybe with sizes of 4 and
8 CPUs per "node" and 8 * 1.75 MByte... you have an "effective" L2 of 14 MByte. Looking at Power4 we see
shared L2 of 8 MByte, (is that right?) with CPU count of 4. What about cache thrashing there?
Secondly, with controllers on chip, you MUST have low latency CPU<->Memory to mask and aggressive
pre-fetch. Sound good or do you have another issue here? Surely this topic is a big one for the designers
I would think!

Rob
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext