Jackson Technology can be implemented without multiple distinct threads. Jackson Technology includes such things as automatic launch of concurrent threads along both paths of an unresolved branch, and sophisticated pipelining of data to minimize stalls due to the length of time a data access can take...
I had never understood the distinction before and thought any CPU implementing Jackson Technology had to implement full Symmetric multi threading (SMT), which implies that the OS thinks it is feeding instructions to two CPU's. Jackson Technology as you describe it could just be considered part of instruction level parallelism (ILP). While true SMT requires that there be a set of registers for each thread, I don't believe this is the case for Jackson technology "threads" created by trying to execute both possibilities of a conditional branch -- it just needs some additional registers allocated from a pool to start the new thread. Even if one of the two threads becomes stalled because it needs a register and there aren't any more, eventually the conditional branch will get resolved, either terminating the waiting "thread" or releasing the registers used by the other one.
I can also see Jackson Technology being able to create multiple "threads" from inline code which has no branches and apparently no dependencies. For example, if a compiler generated 5 instructions manipulating the AX register followed by five instrucionts manipulating BX, possibly including loads from memory but no stores, then either set of code could run first, or the instructions could even be interleaved. By treating these as seperate "threads" it is possible to prevent cache-miss stalls, or, if the CPU has multiple integer execution units, possibly run both "threads" in parallel.
In fact, the more I think about it, Jackson technology without full SMT is probably a better model than SMT without Jackson technology, primarily because it should speed up all compiler generated code, not just code that has been organized by the programmer to be explicitly multi-threaded.
Also, this kind of parallelism avoids per-CPU charges since the operating system (EDIT - nor Bill Gates, HA!) doesn't have to know a thing about the little "threads" that the CPU is creating dynamically, all by itself.
Petz |