To: Ian@SI who wrote (4460 ) 6/4/1998 11:58:00 AM From: Shibumi Read Replies (2) | Respond to of 93625
>>Without RMBS or RMBS like speed, the processor - memory interface becomes an overwhelming bottleneck with the 64 bit architecture. Prior to 64 bit, there are already existing alternatives that don't limit the processor.<< Please have patience with me, I must be terribly confused. I've never designed a 64-bit computer so I'm probably just ignorant -- but I have designed lots of 16- and 32-bit computers, so let me lay out the situation as I see it and you can tell me what I'm missing. When a microprocessor is deemed "64-bit", this is a designation of its address space capabilities. The line size of a microprocessor (or the amount of data which is brought into the microprocessor's cache when data sought by the microprocessor is not in the cache), is not dependent upon the address space and is a function of the cache/memory architecture. Line sizes have been growing as a function of increasing cache size -- limited, of course, by the anticipatory fetch penalty of data surrounding the data that the microprocessor directly addressed. This anticipatory fetch penalty on a non-tenured (normal, these days) processor/memory bus is a statistical function having to do with bus signaling overhead and actual data transfer. So, when a microprocessor, any microprocessor, issues an instruction that isn't in the microprocessor cache, or an instruction references data not in the microprocessor cache, that instruction stalls waiting on the cache to be filled with the information. Microprocessor designers build superscaler processors in an effort to best have as many instructions going on as possible -- so if one instruction stalls, others may get continue assuming that they get hits in their cache. Ideally, if you're a microprocessor designer, you'd like your processor/memory bus to run at least synchronously with your microprocessor. The reason is to reduce the penalties for cache misses -- so your microprocessor keeps busier. I can see Merced having two features that have some relationship to cache misses (and thus, some relationship to the processor/memory interface). The first is that, as I understand it (and I have absolutely no proprietary information on this), the new instruction set is fixed-length (RISC-like) and thus each instruction will be less cache-efficient (take more room in the cache) than the more compact x86-based instructions. The second feature Merced has is the EPIC architecture which should yield a greater degree of internal parallelism, and thus a greater ability to continue to run instructions (assuming that they are in the correct order and that the instructions do not have inappropriate relationships between them) even if one instruction stalls. The larger instruction length would seem to require more processor/memory bandwidth, and the greater parallelism would seem to require an indeterminate incremental processor/memory bandwidth. Of course, the real impact of these features can't be known until you actually go in and run code through either a simulation of the device or the device itself. But neither seem to be major factor in any of the simulations I've seen of theoretical penalties and advantages for 16-, 32-, 64, or 128-bit microprocessors. So -- what am I missing here with regard to your statement that in the processor/memory interface a 64-bit architecture becomes an overwhelming bottleneck? I'm still stuck on the fact that with non-trivial workloads (applications that don't have footprints that fit into primary caches on today's microprocessors), these fast 32-bit microprocessors would do well to have synchronous (i.e., 400MHz and higher) processor/memory interfaces -- which leads of course to much faster memory devices. Thanks for taking the time to explain this to me.