SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Politics : Formerly About Advanced Micro Devices -- Ignore unavailable to you. Want to Upgrade?


To: Process Boy who wrote (85076)1/5/2000 1:51:00 AM
From: Scumbria  Respond to of 1573922
 
PB,

In your scenario of memory latency limited performance , do you have any WAG's as to how to attack "the problem"?

1. Huge caches! 2. ESDRAM main memory. 3. SRAM main memory (for those truly in need of speed.)

Scumbria



To: Process Boy who wrote (85076)1/5/2000 2:00:00 AM
From: ptanner  Read Replies (2) | Respond to of 1573922
 
PB <In your scenario of memory latency limited performance , do you have any WAG's as to how to attack "the problem"?>

How about changing the way the software programs are written (hm... IA-64?) or providing for high capacity memory bandwidth (RDRAM) for data intensive apps or the use of dedicated hardware for specialized tasks (old FP co-processors or graphic cards) provided they don't compete with the main memory path.

It seems the use of multiple memory hierarchies (L1, L2, L3) has provided real benefits and perhaps the use of FP registers instead of stacks? (Can't recall where I read this recently on the web and I am not an EE, just a CE who enjoys computers and have CAD to thank for the opportunity to buy hot-rods.)

These are my own WAG...

PT



To: Process Boy who wrote (85076)1/5/2000 4:40:00 AM
From: Saturn V  Read Replies (1) | Respond to of 1573922
 
Ref - <In your scenario of memory latency limited performance , do you have any WAG's as to how to attack "the problem"? >

The problem can be resolved by prefetching the data from the main memory to L2.
This has been accomplished by an explicit prefetch via an SSE instruction on the latest Intel Compiler. Merced also solves the problem by having the compiler do an explicit prefetch before the data is needed.

However legacy code for X86 cannot benefit from the explicit prefetch unless the code is recompiled. Most Coppermine benchmarks being run are still using legacy code.

A superior technique would have been an implicit prefetch via hardware.The CPU could automatically preload L2 with adjacent pages for any memory operand. This can cause cache thrashing[ eviction of a page just before it is needed ], if the L2 cache size is small. However this approach may be practical for large L2. I wonder if the Coppermine core has another surprise for the larger cache derivatives, in the form of automatic L2 preload.

Willamette may well have the automatic L2 preload, but if the SSE enabled compilers are widespread by the time of its introduction, the feature may be moot.