re: Sun cache lacks ECC?
It is hard to believe that is true. Competitors like IBM would have made a big deal out of that.
Intel worked hard to loose its reputation as a toy processor maker years ago. Here is an announcement from July 14, 1997 from intel:
intel.com
"Intel Corporation today announced a new version of the Pentium® II processor. This new member of the Pentium II processor family contains ECC (Error Correction Code) on the L2 cache, which enables servers and workstations to operate in business computing environments, where data integrity, transmission and reliability are critical."
Intel continues to work to ensure data integrity: developer.intel.com
Here they explain that Merced L2 has ECC, and L1 has parity. Maybe Sun has parity L2 cache. The thinking is that parity is acceptable for very rare errors. If an L1 single bit error happens once per 100 years on average, parity will detect it and reboot the server to prevent data corruption. Reboots are not good, but acceptable at the rate of one per 100 years. The other problem with parity is no double bit error detection. But the thinking is that for a small cache like L1, data is never there too long, so the chance of two single bit errors accumulating is not significant. But now with big L2 caches, there is the worry of double bit errors, so Intel uses ECC on the L2. Maybe Sun uses an L2 cache that is so small that parity is acceptable. That would hurt performance, but Sun does not seem to insist on leading performance.
I do know that Intel has error rate estimates for all caches, and uses them to support claims of no significant change of data corruption from cache on server products. |