To: Joey Smith who wrote (108228 ) 8/24/2000 11:51:37 AM From: pgerassi Read Replies (4) | Respond to of 186894 Dear Joey: Scumbria is not wrong. If the instructions are not in the trace cache which takes a cycle or two to figure out, the instructions have to be decoded and that pipe takes 8 more cycles of waiting before it starts putting the necessary decoded instructions on the cache. Furthermore, since the decoding section is slower than the execute side, there will be further bubbles (cycle or more waits) in the execute pipe. This is borne out by the Huge size of the trace cache. Originally it was supposed to be no more than a few hundred micro ops. It has since ballooned to 12 thousand ops. This is probably why the die expanded from 170 mm2 to 217 mm2. That is an increase of 47 mm2 or about half a P3 die in area. That is about 192KB of cache or works to 16 bytes per decoded micro op. That means that the trace cache is 24 times bigger than the L1 data cache. Since the L1 instruction and data caches are usually equal in size (true from Pentium and up to P3), the original trace cache was 8KB or 512 micro ops. This fits with the original data. Thus, Intel felt a very strong need to increase the trace cache size. This means that the original design goal for IPC or a 10 to 20% loss in IPC was much higher in practice. It was probably more like 30 to 50% loss. That was unacceptable with the clock speed gains about the same. The much larger size of the trace cache probably dropped the loss to the area Scumbria feels is the best they could do or 20% loss in IPC. This is borne out by the comments quoted in some of the articles referred to. This also implied that they needed more width in the decode stage and more width to better balance the design (quick and dirty fixes). Even with all these changes, they still feel the need to redo the benchmarks because the performance is not anywhere near the gains they were looking for. All of this was predicted by Scumbria, myself, and others. The marketing department may also have nixed the changes needed to fix the real problem (probably the double clocked ALU Scumbria points to (and I)), becuase it is a great gee whiz feature needed to sell this new generation (and because to eliminate it would be an embarassment (not to me, many times a good sounding idea does not pan out)). Pete