SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Intel Corporation (INTC)
INTC 37.83-4.3%Dec 12 3:59 PM EST

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
To: Charles Gryba who wrote (150840)12/3/2001 5:47:22 PM
From: kapkan4u  Read Replies (5) of 186894
 
Someone ran this code with a modification that the main loop is run twice to eliminate the effect of program loading. Also note that there are no break statements in the switch so the control just falls trough from the matched case to the end case, performing a long sequence of (load,add,store) instructions.

The results are in the form of "first loop"/"second loop" for 10,000 iterations, in millions of core clocks.

"# of cases" "code size" "P-III ticks" "P4 ticks"

1000 44k 32/32 46/45
2000 68k 69/69 133/132
3000 92k 106/103 205/199
5000 132k 190/180 372/349
10000 232k 399/369 780/716

The ratio of P4/P3 ticks for different # of cases:

1000 45/32 == 1.4
2000 132/69 == 1.375
3000 199/103 == 1.93
5000 349/180 == 1.93
10000 716/369 == 1.94

Looks like when # of cases is >= 3000 the trace cache is largely ineffective and the ratio is determined by the relative speed of decoders and the latency of a case sequence. Each of the cases is a sequence of (load, add, store) instructions. Loads and adds on P4 have half the latency of loads and adds on PIII. Nevertheless, P4 takes 1.94 times more ticks to run this program than PIII.

This program is not that uncommon. In real life it will take fewer than 3000 cases to blow away the trace cache because it is much more likely that each case will have more code than just a single assignment in it.

Kap

#include <stdio.h>
int main(int argc, char* argv[])
{
int i, iterations = 100000, max_index = 1000;

if (argc != 3) {
printf("invocation error: p4_id.exe iterations max_index\n");
printf("iterations: number of times the switch statement is executed in the loop\n");
printf("max_index: number of cases the program will generate in the switch statement\n");
return 1;
}
sscanf(argv[1], "%d", &iterations);
sscanf(argv[2], "%d", &max_index);
printf("#include <stdlib.h>\n");
printf("#include <stdio.h>\n");
printf("unsigned long x;\n");
printf("#define get_stamp __asm RDTSC __asm mov [x], eax\n");
printf("#define get_count __asm RDTSC __asm sub eax, [x] __asm mov [x], eax\n");
printf("int i, sum;\n");
printf("int main() {\n");
printf("srand( 123456 );\n");
printf("get_stamp;\n");
printf("for( i = 0; i < %d; i++ )\n",iterations);
printf("switch ( rand() %% %d ){\n", max_index);
for (i = 0; i < max_index; i++)
printf("case %d: sum += %d;\n", i, i + 1);
printf("default: sum = sum; }\n");
printf("get_count;\n");
printf("printf(\"%%d, time=%%d\\n\", sum, x);\n");
printf("srand( 123456 );\n");
printf("get_stamp;\n");
printf("for( i = 0; i < %d; i++ )\n",iterations);
printf("switch ( rand() %% %d ){\n", max_index);
for (i = 0; i < max_index; i++)
printf("case %d: sum += %d;\n", i, i + 1);
printf("default: sum = sum; }\n");
printf("get_count;\n");
printf("printf(\"%%d, time=%%d\\n\", sum, x);\n");
printf("return 0;\n}\n");
return 0;
}
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext