Rob, thanks for your post. There are still a couple of minor points:
- You still haven't answered the question of whether existing apps can be run in SMT mode without modification. It seems like they should be able to (even the spinning semaphore problem won't prevent this), but is there something I'm missing here?
- In response to your statement, "TLP? Now you are off on another angle. I'm not so sure about TLP and SMT and how they differ or are the same." ... Thread-level parallelism is just a phrase to describe how to increase performance on certain tasks which can be broken down into separate execution threads. You and I are talking specifically about server apps, which already demonstrate high levels of TLP. That's probably why you got confused, since high TLP is pretty much a given in these areas.
Anyway, there are many ways to exploit TLP. The traditional way is to go with multiple processors, but that could get limited by the processor bus (or whatever interconnect fabric is used). The next step is by having the same die run two or more threads at one time. This is accomplished in one of two ways: multiple cores on a die sharing the same L2 or L3 cache (also known as CMP, i.e. the IBM Power4 method), or having a single core run multiple threads concurrently, a.k.a. SMT. Interestingly enough, CMP and SMT aren't mutually exclusive, so in the far future, we could see processors out there which have multiple cores per die and multiple threads per core.
Tenchusatsu |