SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Politics : Formerly About Advanced Micro Devices -- Ignore unavailable to you. Want to Upgrade?


To: Tenchusatsu who wrote (57112)5/3/1999 5:58:00 PM
From: Scumbria  Read Replies (3) | Respond to of 1572604
 
Ten,

Just to clarify things, guys. The larger the L1 cache, the slower it is. The Pentium II and Pentium III is still sticking to a 32K L1 cache, which is why the P6 core doesn't need to add clocks to the L1 cache access.

The small P6 cache helps the situation, but still implies that clock speed is limited by L1 cache access. Who knows how fast P6 would run with an additional pipeline stage for L1 access?

Your statement that a large cache is slower only addresses part of the L1 problem. L1 cache speed has two major issues. One is tag lookup/compare and the other is data array access (which is largely a function of the array size.) Adding additional clocks for L1 access fixes both of these problems.

I expect that K7 MHz will cause Intel a lot of problems.

Scumbria



To: Tenchusatsu who wrote (57112)5/4/1999 10:54:00 PM
From: grok  Read Replies (1) | Respond to of 1572604
 
re: <Intel already said that one of the things Willamette will feature is an instruction trace cache.>

A trace cache stores instructions in the order that they are executed instead of the order that they are stored in memory. It is effective in wide issue superscalar. When you're issuing 6 or more instructions on the average you've got at least one taken branch in the group and all instructions after the taken branch are useless with a conventional cache. If you're issuing 12 instructions you've probably got two taken branches in the group making 12-issue very ineffective with a conventional cache. The trace cache allows you to issue 12 instructions from the execution stream (assuming that execution is repetious).

Intel has been investigating trace caches for IA-64 since they plan very wide issue (I think Merced is 6-issue that the next one is 12-issue). Looks like they decided that it was also useful for x86.