SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: Joe NYC who wrote (16121)10/26/2000 2:06:07 PM
From: porn_start878Read Replies (1) | Respond to of 275872
 
I think this would be big. Are there any rumors if P4 has it?
I think there are more than rumors that the P4 has it, but I can't find any link.

... however it would far from as big as ISSE

Max



To: Joe NYC who wrote (16121)10/26/2000 2:41:06 PM
From: jcholewaRespond to of 275872
 
> I think this would be big. Are there any rumors if P4 has it?

Intel explicitly stated at their microProcessor Forum segment that P4 has hardware prefetching. Stride-type prediction was at the very least implied.

I personally think that HWP (as I call it) is huge, just as huge as "branch prediction" was when it first became prevalent in x86.

Mind you, I should note that there are downsides to HWP. For one thing, it obviously uses the cache+memory bandwidth to do its prefetching, so a badly designed HWP will hog the bandwidth and slow the system down. A well designed HWP will only use bandwidth when the rest of the system does not need it.

HWP is a great crutch for systems with high bandwidth (think "dual channel Direct Rambus DRAM + 256bit per second L2 cache") but high latency (think "DRDRAM" again) memory subsystems. HWP in this case will use the extra bandwidth to stave off some of the bad latency vibes.

A theoretical Athlon with HWP would also benefit immensely. PC2100 SDRAM doubles available bandwidth over PC133 SDRAM, but latency is not improved and it remains a bottleneck in many situations.

The i840 included a "prefetch cache", which is a more primitive precursor to HWP, I'm told. The prefetch cache goes a long way to explain why i840 gets much better spec scores than i820 (moreso than the added bandwidth from the second memory channel, imho).

    -JC