> So, one company is using 11kb vocoders of some kind; another is using 11kb of another kind, a third is using 6kb vocoders....
Tim,
The mixing and matching problems caused by disparate coding schemes has been partially solved through the use of DSP flash adaptation techniques. Cellular and PCS telephone systems, as well as the newer wireline programmable switches...
...all of them interface to the PSTN already, using this approach very successfully.
Flash (on the fly) techniques also portend to be the solution to many potential VoIP problems caused by the non-uniformity of algorithms in that sector.
VoIP vendors and providers use no fewer than a half dozen different compression algorithms today, despite attempts at standardization. Some of these have even used for cellular and pcs. And recall, cell and PCS are moving toward IP, as well.
In cellular, however, the number of conversions is controlled by the more rigid architecture of the established carriers for the most part, and therefore limited to two-to-four translations, at most, yielding minimal amounts of distortion in the analog end product.
The scenario you've painted above, however, can, and probably will be more problematic for some period of time, as fledgling ITSPs and johnny-come-lately ISPs-turned-telcos increase in number, spawning a growing number of island (isolated) networks. Each of these isolated nets will potentially utilize their own favored low bit rate algorithm, dependent in large part on the vendor they choose. Many of these isolated nets will hand off calls to one another (and with enterprise VoIP network fabrics) using traditional telco interfacing standards (DS0s, T1s, PRIs, etc.). The significance of using many traditional DS0, T1, etc. interfaces is that this increases the number of encoding/decoding stages that must take place on the end to end call.
On each of these isolated (proprietary) networks they will introduce unique forms of coding conceivably, thus unique forms of distortion as well, each being near-negligible to the ear on its own. But when combined through multiple tandem connections, they will result in exacerbated latency problems, and other anomalies like gender changes, and other radical changes in analog speech attributes. The way around this kludge is to encode once, and decode once. But who am I to suggest limiting the dreams of wannabe next gen telcoists. The industry isn't ready for encode and decode once, yet, so we'll go through this evolution and probably learn a few tough lessons, again, the hard way.
There have been tests and predictive analyses done on these conditions in the past, as you've suggested. Statistically, instantaneous peaks from the patterns generated by one algorithm may not stack in time with the peaks of another's. In phase and out of phase, additive and deletive, effects take place, not only of the intended signal, but of the digitization artifacts of those signals as well. These lead to exaggerated levels of quantizing errors (aka quantizing noise), aliasing & foldover products, NLD by-products, THD, etc. Again, when these exist on a standalone basis, they are negligible or not severe to the ear, but when they result from an excessive number of dissimilar and phase destructive translations (read: multiple back-to-back encodes and decodes), they can really become bad news for the listener.
Early experiences with these phenomena were mine to behold in the early- to mid- Eighties, when every T1 multiplexor vendor had its own low bit rate voice algorithm. We needed to incorporate quality control measures and in house standards for clients, spelling out which multiplexers (actually, which trunks, using which specific encoding schemes) could work with which others. Early on, it became clear that installing back-to-back sections of 8 k and 32 k coded voice would result in excessive amounts of delay, with unpredictable levels of quality. Traders on securities desks explicitly did not want these for their turrets on their private line ring down circuits, when they were configured in tandem fashion due to the distractive qualities introduced by the delays and other artifacts. Some of the more enlightened ones actually knew to ask for 32's or 64k straights, instead of the 10 to 1 compressed lines of the era.
Again, for those situations where only a few translations are needed, DSPs will handle the problem just fine. My greater concern would not be on the voice encoding scheme, since Darwin will take care of this over time. Rather, I see the other circuit and networking attributes, such as supervision (abcd, e&m, loop start, ground start, etc) and path finding (SS7/AIN, DNS, LDAP, etc) being a much greater challenge to heterogeneous next gen experiments, and other forms of growing-pain-inducing exercises we ae in store for over the next couple of years.
Francois Minard points to a couple of these in the March '99 Cook Report. Here are a few words from the introduction to his article:
Unfortunately, for genuine Internet Telephony to work, there are a lot of issues that have been not been resolved, such as access to SS7 like databases by client controlled Internet Telephony systems, or a new form of E.164 number resolution.
The general issue of alternative resolution technologies for IP services is coming to the forefront of IETF discussions. It is clear that DNS is not suitable for many of the applications that are now being discussed. In Internet Telephony, for instance, there is a need to resolve not just the IP end point necessary to complete a call but to discover the capabilities of that end point BEFORE the call is initiated. Is this a SIP or a H.323 session, for instance? In addition, with more applications, such as voice mail and fax using SMTP, there is a need to have the telephony protocols identify the capabilities of intelligent devices that populate the edge of the network. Much of the interest in Sun Jini, HP JetSend, Microsoft Universal Plug and Play and the technology of stems from this need.
It's important to know what the other end of the connection is capable of handling BEFORE the connection is established. That is an important capability of the future, despite it's being a non-issue for all intents and purposes today, due to the overwhelming universality of the POTS model and the SS7 and AIN capabilities we currently employ.
Well, that touches on a few of the issues, anyway. I'd be glad to hear other takes on this, as well.
Regards, Frank Coluccio
|