SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Politics : Formerly About Advanced Micro Devices -- Ignore unavailable to you. Want to Upgrade?


To: Scumbria who wrote (115650)6/12/2000 8:57:00 PM
From: Dan3  Read Replies (3) | Respond to of 1576257
 
Re: Hope I'm not offending anyone, but I think the T-Bird L2 is a dog....

The impression I'm getting is that even Austin yields are better with Thunderbird. After what happened to the K6 when they added on chip L2 I am not disappointed by Thunderbird's per clock performance. I think the jury's out on binsplits until there is a challenge from Intel.

I also like the 750 chipset by the way - it's rock solid.

Let's let Intel go through yield crashes, binsplit disasters and motherboard recalls in order to get an extra 3% performance at a given clock speed. I think AMD has already learned that particular lesson and there is no need to repeat it.

The Thunderbird L2 can be manufactured easily, unlike the K6 L2 and apparently unlike the Coppermine L2.

Besides, isn't man's best friend a dog? And it appears that this dog can hunt. Or maybe it would be better to call it a mongoose since it seems to like hunting coppermines.

:-)

Regards,

Dan



To: Scumbria who wrote (115650)6/13/2000 10:20:00 AM
From: pgerassi  Read Replies (1) | Respond to of 1576257
 
Dear Scumbria:

It is because you do not understand how the cache latency program works. If you look at the input parameters, one of those parameters is the delta amount, read increment, between Data addresses. This delta is set to ever increasing or in the output I saw, each column's delta was twice that of the column immediately to the left. When the delta was 64 bytes, the latency increased to the 20 cycles quoted. Since the algorithm is executed by a very tight code that may be in the instruction cache, read L1 pipe, at all times, no L2 accesses are required for instruction loads. Thus all activity is in L2 and because of the pipelining, each access to the data, read every CPU cycle, evicts a new L1 cache line and loads in a new cache line from the L2. This does not allow for the normal time given to the L1 to L2 bus to finish transferring the L1 and L2 swap of a cache line. In the L1 to L2 interface on a K75, the write back of a L1 cache line to the L2 was unnecessary due to the duplication and thus, it does not need to write the L1 cache line victim back. Thus, the K75 will show a lower latency for this particular piece of code. However, in general code, this is optimized away since this problem thrashes the L1 cache always and actually would run slower on all the high performance platforms to a large degree.

This is the type of code I would use as a benchmark against a super pipelined CPU like the Williamette. This kind of code requires any CPU to let each data instruction run through its entire pipe before the next instruction can be started. Therefore an "Unknowing Customer" would decide under that benchmark the super pipelined processor is a dud. You have always argued that some of your fellow CPU designers call this a severe limitation and almost always set the significance of this problem too high. That more balanced stages are better for CPU performance than less. But in this benchmark, it is possible that a low latency CPU like a K6-3/400 would beat a high latency CPU like Williamette even at 1.5Ghz, given the same memory speed say, PC133 SDRAM. However, in almost all applications you would argue, that Williamette at 1.5Ghz would "Smoke" the K6-3/400.

Since I know that most code does at least ten accesses to L1 before a L1 miss in general, the L2 swap will be done before a new L2 swap is needed. Thus the full speed nature of the L2 interface will be seen as well as the benefits of the high associativity and additional size. Therefore, TBird will outrun a K75 in almost all cases. Once code is optimized to take advantage of Tbirds cache architecture, the K75 will fall further behind. Remember that Duron will also run better on the code as well (the benefits of having a high volume low cost baby brother).

Pete