SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Intel Corporation (INTC) -- Ignore unavailable to you. Want to Upgrade?


To: Tony Viola who wrote (109120)8/31/2000 3:17:34 PM
From: JDN  Read Replies (2) | Respond to of 186894
 
Dear Tony: That reference is so far over my head I only get a headache reading it. I just want a computer that will be capable of handling voice recognition when the software is appropriate, will download pictures and music exceptionally well. I think I will just give my credit card to Tenchusatu and have him buy it for me!! JDN



To: Tony Viola who wrote (109120)8/31/2000 3:21:42 PM
From: Diamond Jim  Respond to of 186894
 
Tony, here is post by Ibexx on BRCM suit. Nicholas is more brash than McNealy.

From LATimes:

Intel Steps Up Legal Battle With Broadcom
Message 14309109



To: Tony Viola who wrote (109120)8/31/2000 4:55:21 PM
From: Scumbria  Respond to of 186894
 
Tony,

Damone seems to be quite the academic:

Q2) Won't the 8 KB dcache in Wilma (compared to 16 KB in P6 and 64 KB in K7) really hurt performance?

No. The L1 cache is a small part of the overall memory system in an MPU. In a chip like Wilma the L1 serves primarily as a bandwidth-to-latency "transformer" to impedance match the CPU core to the L2 cache and reduce the effective latency of L2 and memory accesses. The big performance loser is going off chip to main memory and the 256 KB L2 in Wilma is what is relevent to that, not the 8 KB dcache. The size of an 8 KB cache is insignificant compared to the scale of the Wilma die and could easily be larger. I think the reason it isn't larger is because Intel wanted to hit a 2 cycle load-use penalty and at the clock rate Wilma targets a larger cache would be a speed path. An 8 KB dcache has a hit rate of around 92% and an 32 KB cache around 96%. A Two cycle 8 KB dcache beats a much larger three cycle dcache for the vast majority of applications given the rest of the Wilma memory system design.

The cache info from IDF is quite interesting. The L1 dcache can performa a 128 bit wide load and store per clock cycle. According to Intel the average cache latency of a 1.4 GHz Wilma is a little over half (55%) that of a 1 GHz PIII in ABSOLUTE TIME. On a clock normalized basis the memory latency is only 77% of P3. That is right boys and girls, a P6 memory access averages about 30% more *clock cycles* than a 1.4 GHz Wilma. How is that possible? Well it isn't just the smaller/faster L1. Intel borrowed a neat trick from the
Alpha EV6 core. The Wilma performs load data speculation in its pipeline and assumes the load hits in L1. If it doesn't then it executes the cache fill and then uses a replay trap to rerun the load. An analogy with football is this is like a receiver running a downfield pattern and expecting the football over the shoulder sight unseen. It is a lot faster than stopping and waiting to catch the football before running downfield.


This is a wonderful (but flawed) theoretical analysis.

Suffice it to say:

1. The L1 miss rate on P4 is 8% (according to Damone). The L1 miss rate on Athlon is 1-2%. That means P4 suffers 4-8X as many misses as Athlon.

2. P4 has a 256K L2. Mustang has a 1GB+ L2. P4 will have to go to DRAM 2-4 times as often as Mustang

3. Everybody does load data speculation in the pipeline. The cache hit signal arrives simultaneously with the data on a cache hit, so it would be impossible to do anything else.

He appears to have a very good academic/marketing understanding of P4.

Scumbria