SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: wanna_bmw who wrote (53123)8/29/2001 11:21:52 PM
From: Ali ChenRead Replies (1) | Respond to of 275872
 
Bimmer, "applications are written such that cachelines aren't continually read along the same divisible boundaries. Any software designer that reads data along a 128KB boundary is an idiot."

Any application designer that uses "128k", "boundary",
or any other explicit constant, is an idiot. Usually
they never heard about these things, and never should.
Applications are written in virtual memory space that
has almost nothing to do with cachelines and
associativity boundaries, which lie in physical space.
It is a function of compiler to map abstract
constructions into particular computer architecture.

"In case you haven't noticed, I know a thing or two about cache design".
No, I haven't. Half a thing at most, maybe even less.

"Before a processor micro-architecture is built, extensive research is done by simulating actual software binaries"

That is maybe the only correct thing you said.
Yes, tomorrow's processor architectures frequently are
built for yesterday's applications. It is a known paradox.



To: wanna_bmw who wrote (53123)8/30/2001 12:26:10 AM
From: Dan3Read Replies (1) | Respond to of 275872
 
Re: Results from a Carnegie Mellon case study can't be applied to just any CPU architecture

I posted links to half a dozen papers.

Hunt down some more if you'd care to - (in general) the number of sets becomes more important than the size of the cache over about 32K. And it takes a very large increase in cache size to make much of a difference. I also (getting in a shot) opined that P4's perilously long pipeline might mean it would benefit more from a larger cache than typical processors.

The amazing thing (to me at least) is that this is the same point at which returns began to rapidly diminish as was the case 10 years ago when main memory sizes were 3+ orders of magnitude smaller, and not nearly as much slower than CPU speed. I remember when the amount of cache was a variable option on many motherboards, and there was much more attention paid to it by everybody (since there was a decision to be made). My initial expectation was that the size cache needed would scale proportionally with the size of the memory being cached, but that is clearly not the case.

Pete's post went a long way towards explaining the importance of "wayness", and why a certain number of cache locations are (to a degree) both necessary and sufficient.

The perpetual crowing of Elmer that AMD can't make large cache chips is clearly disingenuous (I suspect he knows better). A medium sized 16-way cache is arguably higher performing (and more difficult to manufacture) than is a much larger 8-way.

I posted links to tests of applications that are significant users of cache, database serving and compiling, and Athlon does particularly well in those applications when compared to much higher clocked Xeons that have the same size, but lower "wayness" caches. Which is probably a good indication of the benefit of an Athlon architecture cache.