To: tejek who wrote (122832 ) 8/22/2000 3:09:06 PM From: Petz Read Replies (2) | Respond to of 1578306 ted, how Transmeta CPU works: <ref: So that I understand Transmeta and its product...it substitutes software for some of the functions normally performed by a chip? That's why power consumption is less. Is that correct?> That's Transmeta's way of describing it, but it is a little misleading. EVERYTHING eventually gets done in hardware. But the hardware execution unit of the TM CPU is simple, small, low power and high speed. It is a RISC (reduced instruction set) CPU designed to be fast and efficient. The essential concept to understand is that the TM CPU translates x86 code found in your software on the fly into the simpler RISC instruction set of the core CPU (lets call it, "TM code"), but in most cases each x86 instruction is only translated once into "TM code." The translation of x86 code into TM code is not done by a big hardware engine inside the TM CPU, it is done by a software program which once in a while interrupts the execution of already translated code so that it can translate new chunks of code. Thus, the TM CPU is jumping back and forth between translating code and executing the code it has already translated. Since the translation is done by sophisticated microcode inside the TM CPU, it can do optimizations that would be impossible via hardware translation. An Athlon also translates x86 code into a simpler code refered to as RISC86. The difference is that the Athlon does this all in hardware and doesn't try to optimize the RISC86 code. But the hardware that executes the RISC86 instructions has very sophisticated and capable of doing instructions out of order, skipping instructions that aren't really necessary, looking ahead to see if there's something the CPU can do, etc. If a program just consisted of a sequence of intructions without any loops, the TM CPU would be extremely inefficient. It would spend a good deal longer translating the program into its little bite size TM instructions than it would take for even a 486 to execute the program. But if a program has a lot of short loops, the TM design makes some sense. Once a loop's code is translated, it may run 1000 times, and during that time, nothing will interrupt the execution of the loop. It will be running on simple hardware which doesn't do anything fancy like speculative execution, out-of-order execution, etc, because the code morpher used some sophisticated algorithms to translate the x86 code of the loop into the minimum number of "TM code" instructions. There are some disadvantages to this approach, I've made it sound like a panacea. For one, fast floating point requires a total hardware implementation, and fast FPU's take a lot of power. The Crusoe isn't designed to do floating point super fast. Second, if a program jumps all over the place, then the it is less likely that the translated code will be in the cache, which means that code may have to be re-morphed many times. This would happen, for example, on benchmarks that do a lot of multitasking. Finally, its a poor choice for anything that requires a predicable response time, because, when an how often the execution of a program has to be interrupted to do some more "morphing" is unpredictable. PS, see alsoarstechnica.com Petz