SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Intel Corporation (INTC) -- Ignore unavailable to you. Want to Upgrade?


To: Tenchusatsu who wrote (135764)5/21/2001 4:15:39 PM
From: Rob Young  Respond to of 186894
 
Tench,

"You still haven't answered the question of whether existing apps can be run in SMT mode without modification. It
seems like they should be able to (even the spinning semaphore problem won't prevent this), but is there something I'm
missing here?"

Absolutely. If modification were required, that means existing binaries are
broken. If existing binaries are broken or run very poorly, that means you
are in big trouble. IA64 notwithstanding ;-)

Also, if you check out the DB paper you see they are using Oracle and don't
mention mods, likewise the OS paper is using Apache. Now the OS does indeed
need to be modified, but not existing applications.

Could applications be modified to make better use of SMT? I would imagine.
But shouldn't the OS be the determiner (where to schedule and what thread, etc.?).
Besides, if already threaded ... the beauty and the payoff may be that the vendors
get great scaling because of SMT without modification. Surely that would make more
than a few happy. But again, we are seeing 4 PC per EV8. Four seperate threads in
flight at one time and when one stalls or is waiting on resource it goes back on the queue
to keep ALUs busy. I won't do as good as this:

alphapowered.com

Look at slides 12, 13, 14 also ... to answer your question about
modification. Look at slide 17 to see SpecFp95 speed ups for
various members of that suite. You see a doubling in Tomcatv performance
with 4 threads, Swm256 increased 50% with 4 threads.

Threaded applications and speedup are shown in slide 18. Speedups
much greater for some.

" Anyway, there are many ways to exploit TLP. The traditional way is to go with multiple processors, but that could get
limited by the processor bus (or whatever interconnect fabric is used). The next step is by having the same die run two
or more threads at one time. This is accomplished in one of two ways: multiple cores on a die sharing the same L2 or
L3 cache (also known as CMP, i.e. the IBM Power4 method), or having a single core run multiple threads concurrently,
a.k.a. SMT. Interestingly enough, CMP and SMT aren't mutually exclusive, so in the far future, we could see
processors out there which have multiple cores per die and multiple threads per core."

Correct. EV9 or EV10 you have in mind there??? :-O

Rob