To: kapkan4u who wrote (65041 ) 12/6/2001 2:35:24 PM From: Joe NYC Read Replies (2) | Respond to of 275872 Kap (and everyone else),BTW, what happened to the idea of self-education? LOL. I deserved that.Do you have access to a C compiler? Yes, I do. The documentation of the switch is kind of sparse, it does not say anything about performance or implementation in assembly, and I kind of cringe when I see assembly, but maybe on the weekend, when I have more time, I will take a plunge. The last time I did something in assembly was in college on VAX 11.The range of case values has to be continuous or dense for the switch to be generated as a jump table, otherwise it will be generated as a sequence of jumps or as a combination of several tables and sequences of jumps. It's funny how the roles are reversed. I keep having arguments with my wife, I tell her to find out for herself, she says it's much quicker to ask the question someone who probably knows the answer. Anyway, it would have taken me hours to arrive at this answer. BTW, the issues, or questions with your code that I have are following (others are welcome to comment). 1. First: the answer- the swich statement in the code will be a jump table, since the values are sequential and continuous. It will take only very few number of cycles to execute, and the execution time is irrelevant for the comparison. 2. My assumption is that the jump table itself is static data in the code, this code is loaded in memory when the program is loaded for execution. Does this data need to be delivered to the CPU, or can it just sit in memory, and only a few bytes need to be fetched, when the jump of the switch statement asks for the address of the code it needs to jump to? I am assuming the answer is that the jump table can just sit in the memory, it does not need to be transfered to the CPU, and only the 4 bytes of the desired address of the jump table need to be fetched. The implication is that the time it takes to perform this is just a few clock cycles, and can safely be ignored (since we are considering 1000s of clock it takes to do 1 iteration). 3. When a routine, such as the one we are looking at is executed, specifically, when the loop with the switch statement is executed, does the entire code, the code of all the case labels need to pass through the CPU to be decoded? If so why? 4. We need to change code so that we force that the code within the labels is forced through the complex decoder of the Piii. brushwud already looked this up, and e-mailed me with this:I'm looking at Tom Shanley's _Pentium Pro & Pentium II System Architecture_, which says, "The first of the three decoders is classified as a complex decoder and can translate any IA instruction that translates into between two [I think he really means one] & four micro-ops. The 2nd & 3rd decoders are classified as simple decoders and can only translate IA instructions that translate into siingle micro-ops... In general, Simple IA instructions of the reg-reg form convert to a single micro-op. Loads convert to a single micro-op. Stores convert to two micro-ops. Simple read/modify instructions convert to two micro-ops. Simple instructions of the reg-mem form convert into two or three micro-ops. MMX instructions are simple. Simple read/modify/write instructions convert into four micro-ops." So Read modify-write-would be a good one to use. How do we know that the variable we will be using will not be somehow optimized to sit in the register by the compiler? One fact from the executions times of the loops is that the execution time has a linear dependance on # of cases, which means that somewhare, the time is spent doing some processing on ever case. And if the test is correctly set up, we need to confirm that this time is being spent on feeding the series of statements inside each case (right now ADD) through the decoder into the CPU. Joe