SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Politics : Formerly About Advanced Micro Devices -- Ignore unavailable to you. Want to Upgrade?


To: Scumbria who wrote (57231)5/5/1999 12:12:00 AM
From: Tenchusatsu  Read Replies (1) | Respond to of 1572637
 
<The trace cache will likely prove to be another in a long line almost useless features from the minds of architectural dreamers.>

Just like that 9-issue execution unit that might not net that much more performance over a 6-issue execution unit, huh?

Tenchusatsu



To: Scumbria who wrote (57231)5/5/1999 12:19:00 AM
From: grok  Read Replies (1) | Respond to of 1572637
 
Re: <Why would a conventional cache be useless? Caches are completely random access.>
Of course in superscalar you fetch and issue in a consecutive group since you don't have time to randomly access at a branch destination if it turns out that there is a branch in the middle of the group.
The number of issued instructions following the branch are useless if they've been fetched from a conventional cache. The trace cache gives you the instructions at the destination as part of the group.

<What the trace cache does do is allow you to speculatively prefetch without an address calculation.>

Yes, that a way of looking at it. You can prefetch the destination instructions in the same group as the branch all at the same time.

<You will be hard pressed to find an x86 CPU which averages much more than 1 instruction per clock.>

This is true when you examine unfriendly workloads (which may be most real workloads). Also true for RISC processors. (Dick Sites proved this on the 4-issue 21164 which averaged 1 instruction per cycle.) The potential for IA-64 is that it may increase this number from 1 all the way up to something like 2. Yes, it may seem ridiculous that a 6 or 12-issue processor would only average 2 but, actually, it would be a big breakthrough. Significantly better than K7 will achieve.

<The trace cache will likely prove to be another in a long line almost useless features from the minds of architectural dreamers.>

I'm pretty surprised that it is being used in Willamette (if the rumor is true). But I guess that the huge price premium for an extra few % performance justifies it. However, in IA-64 it is very much required.