[ More for POTS too ]
Sorry, for the mess.
Continue with ....... Higher Data Throughput over POTS Stephane Gagne, Spectrum Signal Processing Inc., Canada: icspat.com
multiplexed over one modem line. Con-secutive PPP packets can have very different compression characteristics. This makes the task of compressing data a lot more chal-lenging and difficult. The ability to react quickly and appropriately is primordial. For the first heuristic mentioned above, there exists an optimal 5 algorithm. 2.1 Mode transition Let a be the cost (the number of bits) for switching from Transparent mode to Com-pression mode and b be the cost for switch-ing from Compression mode to Transparent mode. In the current standard, a is a con-stant equal to 16 (8 bits escape character + 8 bits escape code for transition). b , on the other hand, varies and is given 6 by: b = 2g + j, if no step up of the codeword size is re-quired. b = 3g + j + 2, if a step up is required. g, j OE N | 9 œ g œ ‚log2 Max nb codewords—, 0 œ j œ 7 where g is the current codeword size and j is the byte complement (the number of bits required to bring the length of the current stream of data to a multiple of 8 bits). The compressor should switch mode only if it knows the benefits would at least compen-sate for the cost of switching (either a or b). Such knowledge is available if we change the dynamic nature of V.42bis. Instead of giving bytes to the datapump as they are re-ceived from the DTE 7 and processed by the compressor, we delay the process by using two buffers. The first one is used for Trans-parent mode and the second one is used for Compression mode. Regardless of which mode we are in, we always write the output of both modes into the appropriate buffer. 5 With K 1 as an upper bound, where K 1 is the size of the look-ahead buffer. A value of 32 (16bits word) for K 1 is a good com-promise between memory usage and compression ratio. 6 We assume here that there is codeword to be flushed, otherwise b = g + j. 7 Data Terminal Equipment Once we know which mode was best, we give data to the datapump from the appro-priate buffer, then discard both buffers and repeat the process 8 . The question now is how do we know which mode was best? Let's say we are currently in transparent mode and we process a short stream of data which we have just received. If after processing it we find out we would save exactly a bits if we were to use the second (compressed) buffer, should we do the transition (i.e. do we have enough knowledge to commit to either mode)? The answer is no. If following this we get lots of incompressible data, we will need to switch mode again, thus incurring the addi-tional cost of b which we would not have had to pay if we would have remained in transparent mode in the first place. On the other hand, if we were to wait (in the previ-ous example) until the gains (or savings) were equal to a + b, then we have the oppo-site problem as b will be factored twice (once for this transition and another time, later, in the subsequent transition). The trick is to have them share the cost. Hence before committing to a mode, we wait until the gains are greater than or equal to: (1) a + b 2 , for a transition from transparent mode to compression mode. (2) b + a 2 ,for a transition from compres-sion to transparent mode. The equations above describe the gains we need to achieve to commit to a mode switch. If on the other hand we find out that the cur-rent mode encoding is at least as good as that of the other mode, then we can give the appropriate data to the datapump right away. Figure 1 gives the general algorithm. 8 Of course, if the appropriate buffer turned out to be the com-pressed buffer when we were in transparent mode or the transpar-ent mode buffer when we were in compression mode, then we give the appropriate codes (to switch mode) to the datapump before giving it the data from the buffer.
This part may not copy .. RC
2.2 Discarding the transmission dictionary When it comes to the heuristic of discarding the transmission dictionary however, the nature of the compression algorithm itself prevents the existence of an optimal algo-rithm similar to the one we just described in section 2.1. For the compression scheme to work effectively, the dictionary must be trained. Modem manufacturers typically use dictionaries with 2048 entries in them. To build and maintain such a dictionary re-quires large amounts of data. In the previ-ous section we changed the dynamic nature of V.42bis by giving ourselves the equiva-lent of a look-ahead buffer of length K 1 . Because of the large amount of data associ-ated with rebuilding a dictionary, it is not practical here to implement an optimal algo-rithm. K 2 , the size of the look ahead buffer for this heuristic would simply be too large to be cost effective. A good alternative, which is not optimal, but which doesn't require much in terms of memory, would be to use the number of en-tries in the dictionary which have actually been used (to give data to the datapump) versus the number of entries which have been created but never used. Since this al-gorithm is not optimal, stochastic experi-ments would be required to find the best thresholds to use in practice. 3. Modified V.42bis encoder If you do not mind getting slightly out of line with the V.42bis standard, there is a third way to improve your compression ra-tio. This possibility arises from yet another escape code, namely FLUSH. The flush escape code is provided to get the last codeword out of the compressor; this is required when you stop receiving data from the DTE. Nothing, however, prevents us from using it for other reasons. The FLUSH escape code has a cost h = g + j associated with it. At first glance, it seems counter-intuitive to incur such a cost when you do not have to, but as will be shown next, this can prove to be worthwhile doing. Let's say we have processed the string S 1 given in Figure 2 up to and including the second letter of "decentralisation". Fur-Encode( char single) { integer cost; Write_to_buffer1( single); // We just added 8 bits to buf1 cost := -8; // The following function write bits // to buf2 if the compressor gave it any. // It returns TRUE when bits were written IF( Compress_to_buffer2( single) = TRUE) { IF( Stepup_occured = TRUE) { cost := cost + 2g - 1; } ELSE { cost := cost + g; } } IF( Current_mode = TRANSPARENT) { balance := balance + cost; IF( balance >= a) { balance := a; Give_Buffer1_to_datapump(); Discard_buffers(); } ELSE IF( balance <= -b / 2) { // This fn switches mode and deals // with the buffers appropriately. Switch_to_compression_mode(); balance := b; } } ELSE { balance := balance - cost; IF( balance >= b) { balance := b; Give_Buffer2_to_datapump(); Discard_buffers(); } ELSE IF( balance <= -a / 2) { // This fn switches mode and deals // with the buffers appropriately. Switch_to_transparent_mode(); balance = b; } } } Figure 1 S1 = ". decentralisation of governmental ." Figure 2
thermore, let's suppose that the dictionary we are using currently holds the strings "decrement" and "centralisation". If we were encoding the information in a tradi-tional way, the following would be done when processing the third character in "decentralisation": Add 'c' to the current string to make "dec". Since "dec" is already in the dictionary, proceed with the next character. Add 'e' to the current string to make "dece". "dece' is not in the dictionary so output codeword for "dec" and add an entry for "dece" in the dictionary. Reset the current string to the last character seen: "e" and proceed with the next charac-ter. Encoded in such a fashion, the word "decentralisation" might require up to 14g bits to encode. On the other hand, if we had used the FLUSH escape code after seeing the first few letters, we could have encoded "decentralisation" using only 2g + h bits: ú g bits for the first few letters ú h bits for the FLUSH code ú g bits for "centralisation" Hence we would have been able to encode the word "decentralisation" with savings of up to 78% when compared with the conven-tional encoding. Whereas savings will not be this significant in practice, this, never-theless, points out a weakness in V.42bis: V.42bis does not maximise the usage of long strings already in its diction-ary even tough these are the ones with the highest compression ratio. It is noteworthy that the technique described in the previous example has some beneficial side-effects (in addition to the savings we already made). First, we do not delete as many entries in the dictionary as we would have otherwise. This helps us in keeping as many long strings as possible in the diction-ary.Then, by using our long strings more often than they traditionally are in V.42bis, we make these strings grow longer and longer which is to our advantage. All in all, these 2 side-effects are similar to using a larger dic-tionary without the associated drawback of substantially increasing the amount of mem-ory required. As you must have guessed by now, the tech-nique is based on the one introduced in sec-tion 2.1: we delay our commitment to a given encoding. Instead, we buffer the dif-ferent avenues until we know enough to make a decision. This time around though, it's a little bit trickier as we have more than two cases to keep track of. We will need to keep track of all the paths which are within one FLUSH escape code from the best case seen. To be more specific, an optimal algo-rithm would keep track of all the various ways of encoding the information which are within h bits of the best case. Unfortunately, this turns out to be a major draw back. It increases the MIPS consump-tion many folds. In fact, it can degenerate in certain cases 9 and require (on average) over L 1 /2 as much computational power as what would normally be required, where L 1 is the maximum string length. Though this might not prove to be a problem on host based V.42bis compression, it is more than we can handle with an inexpensive board. In practice we could lower the maximum amount of strings or encoding which are being pursued in parallel, but even then it was too much in our case. Consequently we decided not to use this technique, despite the fact our compression ratio would have been increased. Only designs with a large amount of MIPS to spare (as well as data memory) could afford it. 4. Conclusion We investigated ways of improving imple-mentations of V.42bis compressor in such a way that all V.42bis compliant modems in 9 A file containing the same character repeated over and over would be such a degenerate case. the market would decode the stream of compressed bits correctly. We presented an optimal algorithm to determine the most ap-propriate time when to switch from Trans-parent mode to Compression mode and vice versa. We have shown as well that with ju-dicious uses of the FLUSH escape code, we can better exploit the V.42bis dictionary and help to slow down entry deletions while helping long strings to grow, which turns out in better compression ratio. References [1] ITU Recommendation V.42bis, 1990. [2] Clark Thomborson, "The V.42bis Standard for Data-Compression Modems", IEEE Mi-cro, October 1992, pp.41-53. [3] M. Burrows and D. J. Wheeler, "A Block-sorting Lossless Data Compression Algo-rithm", Research Report 124 from Digital's Systems Research Centre, May 1994. [4] Mark Nelson, The Data Compression Book, M&T Books, 1992. [5] Allen Gersho, Robert M. Gray, Vector Quantization and Signal Compression, Kluwer Academic Publishers, 1992. |