Silicon Investor (SI) -- The First Internet Community

STOCKTALK

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor. We ask that you disable ad blocking while on Silicon Investor in the best interests of our community. If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.

Technology Stocks : Spectrum Signal Processing (SSPI) -- Ignore unavailable to you. Want to Upgrade?

To: Galirayo who wrote (442)	1/4/1998 9:16:00 PM
From: Galirayo	Respond to of 4400

[ Continued ] A PCI-to-DSP Bridge for High-Performance DSP Systems Barry Field, Spectrum Signal Processing Inc., Canada: icspat.com I checked my work :)) Ray account for the demands placed on it by both buses with the goal of maximizing system performance. Most bus interface chip makers meet this goal by simply targeting their design at a specific processor's external bus interface. However, because of the diversity among commercially available Dips, a PDSP bridge cannot be designed to optimize data transactions for a specific processor. Rather, it must contain features that enable it to meet performance benchmarks regardless of the processor being used in the DSP system. Thus, the main objective of the PDSP bridge designer must be to create a bridge capable of operating efficiently within a "generic" DSP system. The simplified multiprocessor architecture in Figure 1 is representative of a typical DSP card topology, and serves to illustrate one of the most common methods of interfacing to a bridge chip-the shared memory bus. Optimizing data transfers between this bus and the PCI bus should be the main goal of the PDSP bridge designer. DSP I Shared Memory Peripheral I PDSP Bridge Bus Arbiter DSP II DSP III DSP IV Peripheral II Shared DSP Bus PCI Local Bus Figure 1 Multiprocessor Architecture In order to optimize the performance of the architecture shown in Figure 1, the PDSP bridge must adhere to essential design principles.Bus Decoupling To achieve maximum performance both on the PCI bus, and on the shared DSP bus, an effective PDSP bridge should have large internal FIFOs. This affords a high degree of decoupling between buses, and buffers bus latency so that arbitration delays have minimal impact on data throughput. To further boost performance, separate data pipelines should be included for each data transaction type, as shown in Figure 2. The configuration shown in Figure 2 ensures that data flow is not impeded by arbitration delays within the PDSP bridge itself. DSP Bus PCI 'Slave Read' FIFO PCI Bus PCI 'Master Read' FIFO PCI 'Master Write' FIFO PCI 'Slave Write' FIFO ¬ ¬ ¬ ¬ Figure 2 Data Pipelines Determining the correct size for each of the FIFOs shown in Figure 2 requires analysis of the bus behaviours on both sides of the PDSP bridge. Because of the transient nature of the architecture on the DSP bus, it is difficult to accurately gauge what constitutes an adequate buffer to compensate for arbitration delay, transaction overhead, bus collisions, etc. Additionally,given the high data rates that today's DSPs are capable of sustaining, it is fair to assume that the PCI bus will be the rate-determining factor for data throughput in most DSP systems. From the definition of the PCI protocol, and from numerous benchmarks, it is clear that there is a directly proportional relationship between burst size and bus bandwidth. It follows, that to maximize PCI efficiency, the FIFOs should be made as large as die size and available gates permit. Programmability Among most of the widely available DSPs, there exist few commonalties. Most provide interfaces to external asynchronous memory, but the timing requirements and functionality of these interfaces vary so widely among manufacturers that attempting to design to a common protocol is almost impossible. Add to that the necessity of interfacing to the high-speed synchronous memory interfaces on next-generation DSPs, and the hardware designer's task is compounded further. To alleviate the burden of hardware re-design and to maintain performance across DSP architectures, a PDSP bridge should have completely programmable DSP bus interface cycles. Instead of using costly programmable logic to design a custom interface between the bridge chip and the various devices on the shared DSP bus, the designer simply adapts (via programmable registers) the PDSP bridge cycle to meet the requirements of the target peripheral. The timing diagram of Figure 3 is a simple illustration of a flexible write cycle with programmable cycle parameters.cycle size cycle size offset active active inactive inactive setup hold Address CS# (E#) WR# Data Figure 3 Programmable Write Cycle If the target peripheral happens to be synchronous memory, then the PDSP bridge must be capable of generating synchronous programmable bus cycles. It is therefore essential that a PDSP bridge have an internal PLL capable of synchronizing to a DSP's synchronous memory interface. Coincident with the ability to customize DSP bus cycles is the ability to apply different cycles (using the same physical signals) to different portions of the DSP bus memory map. A PDSP bridge should be able to partition its memory map into multiple regions that are programmable in both size and location. The result is that the PDSP bridge will be able to transfer data between peripherals with widely varying cycle types using the same address, data, and control buses. DMA In order to save valuable MIPs, DSPs come with powerful DMA engines that transfer data between the DSP and external memory. A similar engine, residing on the PDSP bridge, should be used to transfer data between the PCI and DSP buses. Once configured, the PDSP bridge's DMA controller should be able to perform data transfers with little, or no intervention fromthe DSPs or the central resource on the PCI bus. This independence implies support for DMA chaining-the DMA controller should have the ability to auto-configure itself at the end of each block transfer via linked lists residing on the DSP shared memory bus. It also implies an efficient mechanism to notify the DSPs when transfers have been completed. Consequently, interrupts should be coupled with a rich DMA status and control register set to allow the flow of data to be easily managed by the DSP system when necessary. In addition to being able to operate independently of processor control, the PDSP bridge's DMA engine should also be amicable to processor interference. Mechanisms must be in place that allow DMA transfers to be paused, modified, or eliminated altogether. Self-Configurability As stated earlier, ease of integration must be one of the features of a PDSP bridge. One of the drawbacks of having an extremely flexible device is that its many programmable features rarely default to the values required by a particular DSP board design. Hence, a PDSP bridge should have the ability to self-configure from read-only memory upon reset. Most PCI bridges provide a serial EPROM interface which loads internal register sets upon reset. This is a minimal requirement for a PDSP bridge. It should also have the ability to configure itself from any byte-wide ROM that may reside on the shared DSP bus, thus obviating the need for a serial EPROM. Finally, the PDSP bridge should be able to accept configuration cycles from the DSPs themselves in the case where board area is extremely limited and there is simply not enough room for ROM ICs.Universal Signaling The PCI bus is can not only be upgraded in terms of bus width and frequency, but also in terms of signaling levels. The architects of the PCI bus recognized that many signaling environments would eventually shift from 5 to 3.3 Volts. Hence, they provided a means by which cards designed for 5 V systems could gracefully migrate to 3.3V environments--the universal PCI buffer. PCI bridges that contain universal buffers can operate in either 5V or 3V signaling environments. This is extremely important to DSP board manufacturers that offer their products as modular components (such as PMC modules), not just point solutions. A PDSP bridge designed with universal PCI buffers will ensure that cards used in existing 5 V systems will not be obsolete with the arrival of 3 V systems. Packaging A common problem among currently available PCI bridge chips is the packaging. Manufacturers of these ICs are trying to pack as many bells and whistles into their chips as possible, overlooking the importance of chip area to board designers. This is especially true in the DSP board business, where maximizing MIPS per unit area provides a competitive edge. Thus a well-designed PDSP bridge will take advantage of the low pin count requirements of PCI and minimize chip area without sacrificing functionality. More important than chip area is chip height. This is a consequence of the fact that most popular form factors, including PCI, have severe height restrictions on the height of components on the back side of the board. Examples include VME, most mezzanine cards (such as PMC and sPCI), andCompactPCI. The height restriction on the back side of boards means that there is a premium on component side board space. A PDSP bridge should be in a low-profile TQFP package capable of being placed on the back side of high-density DSP boards, freeing up valuable component-side board space. Conclusion As the demands of real-time signal processing steadily rise, so too does the need for faster, more efficient methods of data transfer. The high-bandwidth, interoperability, and flexibility of the PCI bus has made it an attractive data-movement alternative for DSP system designers. As such, a PCI-to-DSP bridge that is designed with the specific needs of DSP architectures in mind will facilitate the incorporation of PCI into a wide range of DSP hardware products. Recognizing that such a bridge does not exist, Spectrum Signal Processing has developed a PCI bridge optimized for DSP architectures. During the development process, special consideration was given to meeting the criteria outlined in this paper in order to provide an optimal PCI to DSP bridging solution. References [1] Hady, F., Efficient Use of PCI, Intel Platform Architecture Labs, 1997. [2] Solari, E., and Willse, G., PCI Hardware and Software Architecture and Design 2 nd Edition, Annabooks, 1995. [3] PCI SIG, PCI Local Bus Specification, Revision 2.1, 1995. [4] Medeiros, J., Embedded PCI: Some Case Studies, RTC, July, 1996.

To: Galirayo who wrote (442)	1/4/1998 9:26:00 PM
From: Galirayo	Respond to of 4400

[ More for POTS too ] Sorry, for the mess. Continue with ....... Higher Data Throughput over POTS Stephane Gagne, Spectrum Signal Processing Inc., Canada: icspat.com multiplexed over one modem line. Con-secutive PPP packets can have very different compression characteristics. This makes the task of compressing data a lot more chal-lenging and difficult. The ability to react quickly and appropriately is primordial. For the first heuristic mentioned above, there exists an optimal 5 algorithm. 2.1 Mode transition Let a be the cost (the number of bits) for switching from Transparent mode to Com-pression mode and b be the cost for switch-ing from Compression mode to Transparent mode. In the current standard, a is a con-stant equal to 16 (8 bits escape character + 8 bits escape code for transition). b , on the other hand, varies and is given 6 by: b = 2g + j, if no step up of the codeword size is re-quired. b = 3g + j + 2, if a step up is required. g, j OE N \| 9 œ g œ ‚log2 Max nb codewords—, 0 œ j œ 7 where g is the current codeword size and j is the byte complement (the number of bits required to bring the length of the current stream of data to a multiple of 8 bits). The compressor should switch mode only if it knows the benefits would at least compen-sate for the cost of switching (either a or b). Such knowledge is available if we change the dynamic nature of V.42bis. Instead of giving bytes to the datapump as they are re-ceived from the DTE 7 and processed by the compressor, we delay the process by using two buffers. The first one is used for Trans-parent mode and the second one is used for Compression mode. Regardless of which mode we are in, we always write the output of both modes into the appropriate buffer. 5 With K 1 as an upper bound, where K 1 is the size of the look-ahead buffer. A value of 32 (16bits word) for K 1 is a good com-promise between memory usage and compression ratio. 6 We assume here that there is codeword to be flushed, otherwise b = g + j. 7 Data Terminal Equipment Once we know which mode was best, we give data to the datapump from the appro-priate buffer, then discard both buffers and repeat the process 8 . The question now is how do we know which mode was best? Let's say we are currently in transparent mode and we process a short stream of data which we have just received. If after processing it we find out we would save exactly a bits if we were to use the second (compressed) buffer, should we do the transition (i.e. do we have enough knowledge to commit to either mode)? The answer is no. If following this we get lots of incompressible data, we will need to switch mode again, thus incurring the addi-tional cost of b which we would not have had to pay if we would have remained in transparent mode in the first place. On the other hand, if we were to wait (in the previ-ous example) until the gains (or savings) were equal to a + b, then we have the oppo-site problem as b will be factored twice (once for this transition and another time, later, in the subsequent transition). The trick is to have them share the cost. Hence before committing to a mode, we wait until the gains are greater than or equal to: (1) a + b 2 , for a transition from transparent mode to compression mode. (2) b + a 2 ,for a transition from compres-sion to transparent mode. The equations above describe the gains we need to achieve to commit to a mode switch. If on the other hand we find out that the cur-rent mode encoding is at least as good as that of the other mode, then we can give the appropriate data to the datapump right away. Figure 1 gives the general algorithm. 8 Of course, if the appropriate buffer turned out to be the com-pressed buffer when we were in transparent mode or the transpar-ent mode buffer when we were in compression mode, then we give the appropriate codes (to switch mode) to the datapump before giving it the data from the buffer. This part may not copy .. RC 2.2 Discarding the transmission dictionary When it comes to the heuristic of discarding the transmission dictionary however, the nature of the compression algorithm itself prevents the existence of an optimal algo-rithm similar to the one we just described in section 2.1. For the compression scheme to work effectively, the dictionary must be trained. Modem manufacturers typically use dictionaries with 2048 entries in them. To build and maintain such a dictionary re-quires large amounts of data. In the previ-ous section we changed the dynamic nature of V.42bis by giving ourselves the equiva-lent of a look-ahead buffer of length K 1 . Because of the large amount of data associ-ated with rebuilding a dictionary, it is not practical here to implement an optimal algo-rithm. K 2 , the size of the look ahead buffer for this heuristic would simply be too large to be cost effective. A good alternative, which is not optimal, but which doesn't require much in terms of memory, would be to use the number of en-tries in the dictionary which have actually been used (to give data to the datapump) versus the number of entries which have been created but never used. Since this al-gorithm is not optimal, stochastic experi-ments would be required to find the best thresholds to use in practice. 3. Modified V.42bis encoder If you do not mind getting slightly out of line with the V.42bis standard, there is a third way to improve your compression ra-tio. This possibility arises from yet another escape code, namely FLUSH. The flush escape code is provided to get the last codeword out of the compressor; this is required when you stop receiving data from the DTE. Nothing, however, prevents us from using it for other reasons. The FLUSH escape code has a cost h = g + j associated with it. At first glance, it seems counter-intuitive to incur such a cost when you do not have to, but as will be shown next, this can prove to be worthwhile doing. Let's say we have processed the string S 1 given in Figure 2 up to and including the second letter of "decentralisation". Fur-Encode( char single) { integer cost; Write_to_buffer1( single); // We just added 8 bits to buf1 cost := -8; // The following function write bits // to buf2 if the compressor gave it any. // It returns TRUE when bits were written IF( Compress_to_buffer2( single) = TRUE) { IF( Stepup_occured = TRUE) { cost := cost + 2g - 1; } ELSE { cost := cost + g; } } IF( Current_mode = TRANSPARENT) { balance := balance + cost; IF( balance >= a) { balance := a; Give_Buffer1_to_datapump(); Discard_buffers(); } ELSE IF( balance <= -b / 2) { // This fn switches mode and deals // with the buffers appropriately. Switch_to_compression_mode(); balance := b; } } ELSE { balance := balance - cost; IF( balance >= b) { balance := b; Give_Buffer2_to_datapump(); Discard_buffers(); } ELSE IF( balance <= -a / 2) { // This fn switches mode and deals // with the buffers appropriately. Switch_to_transparent_mode(); balance = b; } } } Figure 1 S1 = ". decentralisation of governmental ." Figure 2 thermore, let's suppose that the dictionary we are using currently holds the strings "decrement" and "centralisation". If we were encoding the information in a tradi-tional way, the following would be done when processing the third character in "decentralisation": Add 'c' to the current string to make "dec". Since "dec" is already in the dictionary, proceed with the next character. Add 'e' to the current string to make "dece". "dece' is not in the dictionary so output codeword for "dec" and add an entry for "dece" in the dictionary. Reset the current string to the last character seen: "e" and proceed with the next charac-ter. Encoded in such a fashion, the word "decentralisation" might require up to 14g bits to encode. On the other hand, if we had used the FLUSH escape code after seeing the first few letters, we could have encoded "decentralisation" using only 2g + h bits: ú g bits for the first few letters ú h bits for the FLUSH code ú g bits for "centralisation" Hence we would have been able to encode the word "decentralisation" with savings of up to 78% when compared with the conven-tional encoding. Whereas savings will not be this significant in practice, this, never-theless, points out a weakness in V.42bis: V.42bis does not maximise the usage of long strings already in its diction-ary even tough these are the ones with the highest compression ratio. It is noteworthy that the technique described in the previous example has some beneficial side-effects (in addition to the savings we already made). First, we do not delete as many entries in the dictionary as we would have otherwise. This helps us in keeping as many long strings as possible in the diction-ary.Then, by using our long strings more often than they traditionally are in V.42bis, we make these strings grow longer and longer which is to our advantage. All in all, these 2 side-effects are similar to using a larger dic-tionary without the associated drawback of substantially increasing the amount of mem-ory required. As you must have guessed by now, the tech-nique is based on the one introduced in sec-tion 2.1: we delay our commitment to a given encoding. Instead, we buffer the dif-ferent avenues until we know enough to make a decision. This time around though, it's a little bit trickier as we have more than two cases to keep track of. We will need to keep track of all the paths which are within one FLUSH escape code from the best case seen. To be more specific, an optimal algo-rithm would keep track of all the various ways of encoding the information which are within h bits of the best case. Unfortunately, this turns out to be a major draw back. It increases the MIPS consump-tion many folds. In fact, it can degenerate in certain cases 9 and require (on average) over L 1 /2 as much computational power as what would normally be required, where L 1 is the maximum string length. Though this might not prove to be a problem on host based V.42bis compression, it is more than we can handle with an inexpensive board. In practice we could lower the maximum amount of strings or encoding which are being pursued in parallel, but even then it was too much in our case. Consequently we decided not to use this technique, despite the fact our compression ratio would have been increased. Only designs with a large amount of MIPS to spare (as well as data memory) could afford it. 4. Conclusion We investigated ways of improving imple-mentations of V.42bis compressor in such a way that all V.42bis compliant modems in 9 A file containing the same character repeated over and over would be such a degenerate case. the market would decode the stream of compressed bits correctly. We presented an optimal algorithm to determine the most ap-propriate time when to switch from Trans-parent mode to Compression mode and vice versa. We have shown as well that with ju-dicious uses of the FLUSH escape code, we can better exploit the V.42bis dictionary and help to slow down entry deletions while helping long strings to grow, which turns out in better compression ratio. References [1] ITU Recommendation V.42bis, 1990. [2] Clark Thomborson, "The V.42bis Standard for Data-Compression Modems", IEEE Mi-cro, October 1992, pp.41-53. [3] M. Burrows and D. J. Wheeler, "A Block-sorting Lossless Data Compression Algo-rithm", Research Report 124 from Digital's Systems Research Centre, May 1994. [4] Mark Nelson, The Data Compression Book, M&T Books, 1992. [5] Allen Gersho, Robert M. Gray, Vector Quantization and Signal Compression, Kluwer Academic Publishers, 1992.