Silicon Investor (SI) -- The First Internet Community

STOCKTALK

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor. We ask that you disable ad blocking while on Silicon Investor in the best interests of our community. If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.

Technology Stocks : Intel Corporation (INTC) -- Ignore unavailable to you. Want to Upgrade?

To: Tenchusatsu who wrote (109145)	9/1/2000 12:07:10 AM
From: Rob Young	Read Replies (2) \| Respond to of 186894

"anti-Itanium FUD at beginning" Don't you remember that? Shortly after the MPF presentation the Intel/HP fellows came back and showed how they could do the 8 Queens problem in fewer instructions whereby the Alpha folks showed they could do it in even fewer! Besides, you must appreciate these aren't garage shop wanna-bes or Internet wanna-bes these are serious engineers. So when Pete Bannon (who is one of Compaq's senior consultants .. an honor actually) trots out the difficulties with Itanium (obvious run-time problems, i.e. Java , and function calls, etc.) you can bet he knows what he is talking about and it doesn't fall into "FUD". He is showing where Itanium is weakest. <Why 16 as a choice? Most hops are 2. When you add up the latency for those 2 hops you are still doing much better than an L3 hit and of course main memory.> You probably meant "L2 hit" instead of "L3 hit." But that's not the point. --- Actually that is what I meant. If a remote L2 that is 2 hops away has less latency than a possible L3, then it could be stated that it has a better more effective 24 MByte L2. Whereby another Alpha or architecture would have L1 , L2 and L3 that takes x number of cycles to access. The L3 would be 8 MByte and odds are with larger footprint stuff you miss L3 quite often .. time to go to main memory. HOWEVER, 24 MByte is more than enough of a sweet spot .. (see the earlier referenced paper for some detail) that it is much larger win.. bag L3 all the way around! "latencies incurred with heavy P2P traffic, which is highly variable but must still be considered." No.. they use a "smart" router to ensure it takes a path that is optimal. Note the 4 channels. System can't be overloaded? Didn't say that. But I did see a foil elsewhere that said something about 100 GByte/sec main memory bandwidth .. aggregate. Regarding the latencies ... the 15 ns is "load to use". Maybe I'm misreading that but it includes getting it from remote L2 and loading it into local L1. That's my read... Besides , if you think about it a bit... they are sandbagging there too. After all, P2P CPU is at CPU speeds B^). Software optimized to take advantage of it? No.. not at all. Remember the goal here is to run programs unmodified. You mispoke I believe .. the OSes have to be modified. That's a given and they have been and are running in a NUMA today. Best case Wildfire local memory access today is 330 ns, remote is 960 ns. With on-chip memory controller the RDRAM local memory access is on the order of 70 ns I believe. Much better. To hit remote memory with 2 hops you get 70 ns + 15 + 15 maybe some factor in there too but what 130-150 ns? I think that might be right but they don't like to talk about exactness this early. But where it is much better and where we are talking past each other in a sense is 2 hop remote L2 hit. Traditionally, you go over a switch. 21364 you don't and unless I'm mistaken a 2 hop remote L2 hit could be 45 ns (fudge factor there). You won't get that kind of latency in a traditional arch. Since OLTP is very latency sensitive , the 21364 will shine in this space. Power4 has a good thing going there too but what about 16? 32? 64 CPUs? Can anyone afford 64 Power4 CPUs? Re-reading.. don't get hung up on the UMA versus NUMA thing. All Tru64 and VMS software runs unmodified in their current Wildfire "NUMA". Most Wildfires with Tru64 run as SSI, one big flat memory. The thing I think you are getting hung up on is that some memory accesses are faster than others. No big deal (but VERY big deal at the OS level). Look at the "slow" 21364 memory access. Much better than the "fastest" Wildfire (aka GS320) memory access. Rob