SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Advanced Micro Devices - Moderated (AMD) -- Ignore unavailable to you. Want to Upgrade?


To: Joe NYC who wrote (11172)10/2/2000 7:03:47 PM
From: pgerassiRead Replies (1) | Respond to of 275872
 
Dear Joe:

RDBMS work can be handled by NUMA easily as most accesses are merely using the bigger memory as a hard drive cache. The internal Oracle tables rarely go beyond 100MB per thread. When you have many thread groups, you are talking about more than 1 CPU and 2GB (or 4GB) per CPU is more than enough for local memory.

The reason for the 36 bit physical address size of Xeon is the same as the 8086, the segment address is shifted 4 bits to the left and added to the offset. How many Windows boxes with greater than 4GB are using only one CPU to access all that memory? Do you even know if, Windows allows any one CPU to see more than 4GB simultaneously? If not, what is the delay to reach a different memory area? Does those 8 and 16 DIMM boards connect the memory to the same chip as the one that conects to the CPU? If not, then whatever bus they are using between the two chips will introduce roughly the same latency as if, no memory was local. Thus your question gets turned on its head. How much of a performance increase would be realized if, local memory is used for all high priority needs (read running programs, device drivers, and system code)? Is it worth changing the OS to optimize the mapping and / or memory management to take this into account?

I suspect that for those applications that need huge datasets like RDBMS, the latency will not be a large factor. Just look at local memory as another exclusive cache level to total memory. Given this, it becomes rather obvious that as long as the local memory holds the working set of the largest app, NUMA architectures are not a problem.

You know its strange that we come back to the cache look time and time again. We have the following bandwidth, latency (access), size, and ganularity memory table:

1) Registers: >50GB/sec, no latency, 128-384 Bytes, 8 Byte.
2) L1 cache: 8-32GB/sec, 2-3ns, 8-64 KB, 32-64 Bytes.
3) L2 cache: 2-32GB/sec, 7-11ns, 64KB-2MB, 32-64 Bytes.
4) Local Memory: 800M-3.2GB/sec, 40ns-90ns, 64MB-4GB, 8-32MB.
5) Far Memory: 800M-3.2GB/sec, 60ns-250ns, 4GB-1TB, 64MB-4GB.
6) Online Storage: 10-160 MB/sec, 5-20ms, 10GB-10TB, 10GB-80GB.
7) Web Storage: 2KB-10MB/sec, >100ms, >100MB, >1MB.
8) Offline Storage: 150KB-6MB/sec, >1s, >600MB, >600MB.

Of course the last four stages vary widely depending on configuration.

Pete