Scumbria,
The reason for having local L2's is to avoid having to access the system bus (and thus DRAM.) That is why Intel is able to charge a huge premium for Xeon processors with large onboard L2 caches.
In my ideal solution, I forgot to point out that all of those components (2+ CPUs, memory controller and L2) would be in one package.
But in a scenario of 2 CPUs, each one with it's own L2 cache, if CLU#1 needs to access data that is not in it's L2, doesn't it need to check if the data is in L2 of CPU#2 first, before it can be accessed from DRAM?
If it is true, that CPU#1 needs to check if data is in L2 of CPU#X (using system bus), doesn't a system with 4 or 8 CPUs spend too much time maintaining integrity of data between L1s, L2s and main memory?
Now, a really ideal solution would be to have multiple somewhat independent processing units build on 1 die, (with integer and floating point units allocated dynamically by processes, as needed) with huge L1 + NB on the same die and fast, very wide path to main memory?
I wouldn't be surprised if this wasn't the architecture of Jalopeno.
Joe |