Speech Recognition / Synthesis Core Delivers 97% to 100% Reliability
Business Editors/High-Tech Writers
LEUVEN, Belgium--(BUSINESS WIRE)--Oct. 25, 1999--
Available as HDL, ASIC and DSP Object Code SoC Versions Available for as Little as $1.40 Each
Frontier Design today introduced a C-language speech recognition and speech synthesis (SRS) core for use in mobile phones, remote controls, automobile climate control, toys and other applications with a voice-controlled user interface. The core consistently delivers 97% to 100% accuracy (typically higher than 98%) in multiple languages, including English, French, German, Dutch, Hebrew and Chinese. Frontier will demonstrate the core at DSP World, in Orlando, Florida, November 3-4, 1999, Demo Pod No. 156. Available in DSP Object Code, HDLs or ASIC Implementation -- The SRS core is available in object code for DSP or RISC processors and PC platforms; as a VHDL or Verilog core, with or without interfaces to other on-chip logic; as a cell-based SOC including codec and amplifiers; or in a complete OEM module that includes speaker, microphone, IFR, RF and other functionality. The C-language core currently runs on DSP processors from Texas Instrument (TI320C62XX) and National Semiconductor (CR16B DECT core). It can be compiled to run on any DSP or RISC processor. Since the speech recognition algorithm only requires 5 to 10 MIPS, any existing pager, mobile telephone, or other system with 5 MIPS of spare processing power can include the SRS core with no extra overhead. The memory requirement for speech recognition templates is less than 1 Kbyte of RAM per word (30 to 40 templates require maximally 40 Kbytes of RAM). Synthesized speech takes up less than 1 Kbyte per second. Sixteen Kbytes of ROM accommodates 20 seconds of high-quality synthesized speech. 97% to 100% Accuracy -- In speaker-dependent applications, the SRS core consistently delivers recognition performance between 97% and 100%.When tested using the TI20 standard vocabulary of 3,200 words an overall accuracy of 98.7% was measured. The TI20 standard is a subset of the TI46 audio database published by the USA National Institute of Standards and Technology (Disc 7-1.1, September 1991). This database is used to test speech recognition algorithms and products using a standard database of English words spoken by sixteen different male and female speakers. It includes multiple utterances of the digits zero to nine, ten computer commands and 26 letters. The TI20 test uses a subset of the TI46 database with 3,200 words and is frequently used to test DECT voice dialing applications. Frontier's SRS core consistently gets high recognition scores regardless of the user's language. It has been tested using English, French, German, Dutch and Chinese. Herman Beke, Frontier's CEO, said, "Our C-language speech recognition and synthesis core is the ideal solution for any application that needs a voice-controlled user-interface. Our A|RT design methodology which we used to implement the core in various implementations lends itself to design reuse because the basic functionality is always maintained in the C language. C-language designs are much more easily optimized for power, performance or cost, than are HDL implementations. As a result, functionality and customer requirements are guaranteed before we generate the customer implementation. "We are finding that we are beating our competition consistently in this marketplace. I think this is because we are able to provide our customers with the implementation that meets their cost, power, functionality and time-to-market budget. Using the same C-language, system-level IP core, our customers can get to market quickly using a DSP processor or an FPGA and then, as product demand warrants, quickly and easily convert the design to a low cost ASIC implementation. This capability is critical to success in this marketplace," Beke concluded. Voice Control Capability For DECT Phones -- Frontier's SRS core has been ported to National Semiconductor's CRC16B core to provide users of National's DECT chipset with voice controlled dialing. Asmund Tielens, Managing Director of National's Design Center in the Netherlands, said, "Our DECT customers are demanding voice-activated dialing. However, getting the voice recognition quality required to make these products marketable is very difficult to do and can be quite expensive. Frontier Design generated object code for the CRC16B that allows us to offer speech recognition with no modification to our hardware and no degradation in performance. Before dialing, when the processor is not being used for transmission, it has idle capacity that is used by the speech recognition core for voice activated dialing. The entire speech recognition and synthesis functionality, including the speech recognition templates, takes up only 20 Kbytes which fits in the flash memory already on the CR16B core. As a result, we can offer our customer an extremely high quality speech recognition capability with absolutely no hardware redesign and minimal added cost. We think this will enhance our competitive edge in the DECT marketplace." Ultra-low power/low cost SOC Implementations -- Frontier Design has also created low-power, low-cost system-on-a-chip (SoC) implementations that incorporate the speech recognition memory, synthesis ROM memory, AD and microphone amplifier and PWM speaker stage. The SoC requires no external components. Additional functionality, such as echo cancellation, speech compression, or caller-id detection can be added with a minimal increase in gate count. For example adding echo cancellation with the SRS core would require only 1,000 additional gates. Complete OEM systems are also available that include microphone, speaker, battery and packaging. Frontier Design has developed one such speech recognition and synthesis system for Columns Ltd. of Singapore. Its first application is a voice-controlled currency translation device. The design includes a complete speech recognition system-on-a-chip, microphone, and speaker. The speech recognition SoC was designed using Frontier's C-to-Silicon design methodology and includes the SRS co-processor, ADC, DAC, serial interface, 30 Kbytes of RAM for the storage of speech recognition templates, and an additional 20 Kbytes of ROM for the storage of high quality speech synthesis phrases. The 15,000 gate SRS SOC delivers 2 MIPS throughput at 2 MHz, with an average word recognition latency of only 250 ms. The device consumes only 6 uA in standby and 16 mA during speech recognition and synthesis (while driving the speaker). Using a single CR2-430 battery the product using the ASIC will have a planned battery life of two years. According to Cees Heikamp, Columns' president, "Column's business is the development and marketing of low-cost, high-quality voice-controlled products for the consumer market. In developing our product plan we exhaustively evaluated all the speech recognition and synthesis alternatives available. We looked at software options and off-the-shelf speech recognition chips. The Frontier solution is the only one that allows us to achieve an exceptionally high-quality customized speech recognition solution, with a very small footprint, that is extremely low power and cost effective. "Our first product using this core is a voice-controlled currency translator. The user tells the device the amount and currency to be translated and the currency into which it will be translated. The currency translator then tells the user the answer. It consistently provides 97% to 100% accuracy in virtually every European language and has an expected battery life of three years. "We expect to use this same IP for many future products, such as voice-controlled calculators and toys. These are low-cost items that are typically purchased by companies to distribute to their customers for free so keeping costs down is critical. At less than $2, the Frontier solution represents a 50% cost savings over any comparable quality alternative," Heikamp concluded. Advanced Voice Recognition Algorithms -- Frontier Design employs the Mel Frequency Cepstrum Coefficient (MFCC) algorithm for acoustic feature extraction, continuous noise level estimation to eliminate background noise; coarse and fine word boundary detection to define the word boundaries, and Dynamic Time Warping algorithm to identify the utterance.
-- Mel Frequency Cepstrum Coefficient Algorithm -- The Mel scale is a frequency scale in which the sensitivity of the human ear to frequency variations is equal across the spectrum. Mel scaling results in less frequency sensitivity at high frequencies. The MFCC algorithm consists of the calculation of an FFT power spectrum, followed by Mel scaling, log ii and an inverse cosine transform (iDCT). This transformation is performed on overlapping frames of samples that have been hamming windowed. -- Continuous Noise Level Estimation -- The noise level estimation routine operates continuously adapting to variations in the level of the background noise. It uses multiple estimates and a selection algorithm to identify and eliminate background sounds and speech artifacts (e.g. breathing, saying "uh"). -- Coarse Word Boundary Detection -- Coarse Word Boundary Detection determines when a whole word has been pronounced based on the energy contour, energy level and energy pulse duration characteristics of the audio signal -- Fine Word Boundary Detection -- The fine Word Boundary Detection algorithm separates irrelevant sounds (e.g. mouth clicks, breath noise, microphone rumble and background sound)from the word by performing a detailed analysis of the energy levels during and surrounding the word. -- Dynamic Time Warp Algorithm -- The DTW algorithm is used to actually identify the word. DTW compares of series of vectors with unequal length and with duration variations within the series. The resulting DTW distance is the weighted average difference between the feature vectors of the compared utterances, independent of their absolute time position in the energy pulse, but dependent of their relative position in the acoustical variations within the energy pulse, allowing them to be compared based on only on their acoustical features. The DTW distances for all stored templates are simultaneously compared to the utterance in a single forward pass. The unknown word is identified as the template with the closest length to it and based on the smallest calculated DTW distance. If the DTW distance is too high or if the first and second best matches are too close to each other, the word is rejected as unclear.
Pricing & Availability -- Frontier Design's C-language speech recognition and synthesis core is available now. The object codes are avaiable for a per chip license fee of as low as $0.30 in quantities of 1,000,000 or more. Unpackaged ASIC implementations are available for $1.40 each in quantities of 500,000 or more and $2.40 each in volumes of 1,000 units, plus $25,000 to $75,000 for design services, and $30,000 for ASIC prototypes. ASIC packaging adds approximately $0.40 per unit. Buyout license schemes are also offered. Frontier Design was founded in 1997 to develop a next generation system level design methodology called A|RT (Algorithm to Register Transfer) and to sell innovative EDA products, complex intellectual property cores and design services based on this methodology. A|RT EDA tools start from a specification in the C-language. They are used by Verilog or VHDL hardware designers to improve design productivity and design quality, in terms of product cost, power consumption and performance. The A|RT design methodology supports existing system-level design flows provided by companies such as Cadence Design Systems (NYSE:CDN), Mentor Graphics (Nasdaq:MENT), Synopsys (Nasdaq:SNPS), and by offering a quick and easy path from the C-language to Verilog or VHDL. System-level IP cores offered by Frontier Design include the complete Layer 1 GSM baseband , G.723.1 and G.723.1, low bit-rate speech-compression, high performance, low-cost echo cancellation and others, as well as speech recognition and synthesis. Frontier Design sells its tools, IP and design services from its facilities in California; Florida; Leuven, Belgium; Tiel, The Netherlands; Tokyo, Japan; and through a growing number of distributors and representatives in North America, Europe, Japan, and the Pacific Rim. |