Dear Jozef:
HT 3.0 was updated so that a 16/16 cHT3.1 link would be faster than Intel's 20/20 QPI link. The latter is supposed to be 32GB/s for all 40 bits or 6.4GT/s. However as all data is 8/10 encoded (8 bits worth of data in every 10 bits actually sent), the real transfer rate is 25.6GB/s minus overhead. HTx.x has about 4 bytes of overhead per 64 bytes of data sent plus 1 frame every 256 worth of CRC checksums. Thus the overall overhead of memory data is 5.9% leaving 94.1% data. Thus a 3.2GHz 16/16 cHT3.1 link nets up to 12.04GB/s each way of data for a link carrying 24.09GB/s. Intel uses a different protocol with 10 byte granularity instead of HT's 4 byte granularity. Thus the same 64 bytes of data use 16 bytes of overhead for a 20% of raw data (at a minimum (coherent actions need another flit for an overhead of 26 bytes)). Thus the above leaves an effective 20.48GB/s. 24.09GB/s is much more than 20.48GB/s which is why Intel wants to use raw rates than effective ones. Using the effective rates, to get 20.48GB/s link, HT needs only to run at 2.72GHz. Since the HT3.0 link goes to 2.6GH.z, it was not large enough to be faster, thus the speed up to 3.2GHz. If you use the 26 bytes of overhead for cQPI, you get an overhead of 28.9% and a effective data rate of 18.20GB/s meaning that cHT needs only to go to 2.42GHz, which is below HT3.0's 2.6GHz maximum.
Intel also uses the 164 pin width for HT, but fails to mention that is for a 32/32 link. Their 20/20 link uses 96 (80 data, 16 clock, 0 control) pins, where a 16/16 cHT3.1 link uses only 84 (64 data, 8 clock, 12 control) pins. Thus a 16/16 cHT3.1 link uses less pins, is faster in effective throughput and lower in latency. A 32/32 1.4GHz cHT2.0 link is faster than a 3.2GHz 20/20 QPI link, although it uses more pins. A 32/32 3.2GHz cHT3.1 link is over 2.5 times as fast as the 3.2Ghz 20/20 QPI link for only 71% more pins with much lower latency as the 20/20 QPI link needs 5 cycles to the 16/16 cHT's 2 cycles or 32/32 cHT's one cycle. QPI's minimum link is 5/5 using 24 pins while HT3.1 minimum link is 2/2 and uses 20 pins (up to HT2.0, it used only 19 pins).
All in all, QPI has a ways to go to get to HT's known capabilities.
Pete |