Someone ran this code with a modification that the main loop is run twice to eliminate the effect of program loading. Also note that there are no break statements in the switch so the control just falls trough from the matched case to the end case, performing a long sequence of (load,add,store) instructions.
The results are in the form of "first loop"/"second loop" for 10,000 iterations, in millions of core clocks.
"# of cases" "code size" "P-III ticks" "P4 ticks"
1000 44k 32/32 46/45 2000 68k 69/69 133/132 3000 92k 106/103 205/199 5000 132k 190/180 372/349 10000 232k 399/369 780/716
The ratio of P4/P3 ticks for different # of cases:
1000 45/32 == 1.4 2000 132/69 == 1.375 3000 199/103 == 1.93 5000 349/180 == 1.93 10000 716/369 == 1.94
Looks like when # of cases is >= 3000 the trace cache is largely ineffective and the ratio is determined by the relative speed of decoders and the latency of a case sequence. Each of the cases is a sequence of (load, add, store) instructions. Loads and adds on P4 have half the latency of loads and adds on PIII. Nevertheless, P4 takes 1.94 times more ticks to run this program than PIII.
This program is not that uncommon. In real life it will take fewer than 3000 cases to blow away the trace cache because it is much more likely that each case will have more code than just a single assignment in it.
Kap
#include <stdio.h> int main(int argc, char* argv[]) { int i, iterations = 100000, max_index = 1000;
if (argc != 3) { printf("invocation error: p4_id.exe iterations max_index\n"); printf("iterations: number of times the switch statement is executed in the loop\n"); printf("max_index: number of cases the program will generate in the switch statement\n"); return 1; } sscanf(argv[1], "%d", &iterations); sscanf(argv[2], "%d", &max_index); printf("#include <stdlib.h>\n"); printf("#include <stdio.h>\n"); printf("unsigned long x;\n"); printf("#define get_stamp __asm RDTSC __asm mov [x], eax\n"); printf("#define get_count __asm RDTSC __asm sub eax, [x] __asm mov [x], eax\n"); printf("int i, sum;\n"); printf("int main() {\n"); printf("srand( 123456 );\n"); printf("get_stamp;\n"); printf("for( i = 0; i < %d; i++ )\n",iterations); printf("switch ( rand() %% %d ){\n", max_index); for (i = 0; i < max_index; i++) printf("case %d: sum += %d;\n", i, i + 1); printf("default: sum = sum; }\n"); printf("get_count;\n"); printf("printf(\"%%d, time=%%d\\n\", sum, x);\n"); printf("srand( 123456 );\n"); printf("get_stamp;\n"); printf("for( i = 0; i < %d; i++ )\n",iterations); printf("switch ( rand() %% %d ){\n", max_index); for (i = 0; i < max_index; i++) printf("case %d: sum += %d;\n", i, i + 1); printf("default: sum = sum; }\n"); printf("get_count;\n"); printf("printf(\"%%d, time=%%d\\n\", sum, x);\n"); printf("return 0;\n}\n"); return 0; } |