Automakers Struggle With Speech Recognition techweb.com (12/04/00, 12:30 p.m. ET) By Charles J. Murray, EE Times The in-car PC boom that was supposed to be in full swing by now hasn't happened, and it may be delayed another 12 to 18 months as automakers and vendors run up against hurdles in implementing speech recognition systems.
The holdup is a disappointment for manufacturers that have invested millions in the development and promotion of in-car PCs. But because speech recognition systems are critical to addressing potential driver distraction issues, carmakers want them to be as close to perfect as possible. And they're not there yet, industry observers said.
"When we sign nondisclosure agreements and talk to automotive vendors, they all acknowledge that there are problems with speech recognition in vehicles," said Fred Nussbaum, vice president of business development at Clarity LLC, Troy, Mich., a maker of software-based speech-capture systems. "They're not going to talk about it publicly, because there are a lot of legal ramifications, but the problems are there."
Software makers and industry analysts said those problems are a key reason why Cadillac has delayed the introduction of its Infotainment system from fall of this year until late 2001, and why Visteon Corp. (stock: VC) has yet to put its ICES (information, communication, safety, and security) technology into a production vehicle.
Industry analysts also blame the lack of a good speech system for the dismal performance of Clarion's AutoPC. Clarion Technologies Inc. (stock: CLAR) had expected to be selling the system at a rate of thousands per month by 2000, but has sold just 3,500 units in two and a half years.
"Speech recognition is definitely a hurdle," said Thilo Koslowski, a senior analyst at GartnerGroup's e-business automotive service. "Manufacturers have to be very careful about deploying systems that are not 100 percent reliable because they could face lawsuits from consumers."
Eyes on the road
The race to create more effective speech systems is seen as critical for automakers. Several of them, most notably Ford and General Motors, have espoused an "eyes-on-the-road, hands-on-the-wheel" philosophy as they work to incorporate new electronic capabilities into automobiles. That philosophy is seen as especially important now, in light of the recent passage of state laws restricting drivers' use of cell phones.
But automakers said they can't bring eyes-on-the-road, hands-on-the-wheel techniques to vehicles unless they have good speech recognition systems. That's why General Motors has forged partnerships with General Magic Inc. (stock: GMGC) and Nuance Communications Inc. (stock: NUAN) to work on systems. It's also why Ford has allied itself with speech recognition developer Lernout & Hauspie Speech Products (stock: LHSEQ), which filed for Chapter 11 protection last week following management missteps.
Automakers said they plan to continue to work on speech recognition systems, but they deny there are problems.
"The technology is where we expected it to be," said Ed Chrumka, advanced technology manager at OnStar.
Indeed, OnStar representatives said the company's Virtual Advisor, an off-board speech-based service that provides e-mail, news, and stock quotes, is coming out as scheduled at the end of this year. Delivery of the system has already begun in the Northeast, and industry analysts said they are impressed by it.
"In the testing that we've done, it performed at a very high level," said Dawn McGreevey, an automotive analyst at Gomez, Lincoln, Mass., an Internet system quality-measurement company.
But in-car PCs, which use onboard electronics, have not fared as well. Cadillac's much-ballyhooed Infotainment system, which was already supposed to be available on the Cadillac Deville, is at least a year behind schedule. A General Motors spokesman declined to comment on reasons for the delay, except to say that there are "technical issues."
Similarly, a Visteon spokeswoman said the ICES system is in development programs with several OEMs, but would not say when it will reach production. At the 1998 Society of Automotive Engineers conference in Detroit, however, Visteon and Ford predicted the first units would be in vehicles by 2000.
Automotive engineers and software makers do acknowledge that equipping car systems for speech recognition has proved a more formidable task than had been expected.
"There's an overriding perception that speech recognition has no moving parts and is easier to implement than it really is," said Ron Risdon, vice president of business development at Conversational Computing Co., which makes the Conversay speech technology product. "Over-optimism is very prevalent."
Wall of sound
The crux of the problem is that vehicles, unlike desktop PCs, are subjected to a wide variety of noises that can confuse software-based speech recognizers. Compounding the problem is that in-car speech recognition is often done by remote servers over cellular links.
"Working with speech recognition over a cellular link is like doing magic," said a senior engineer who works for a major automaker. "You have to worry about more than just the noise generated by the vehicle. There are about 20 different sources of noise."
If speech recognition is done over a cellular link, the system also must deal with such issues as line echo, electrical interference, and poor signal strength.
Automotive engineers said the problems aren't insurmountable.
"It's not a matter of whether the technology is mature," OnStar's Chrumka said. "It's more an issue of the application of the technology in variable environments."
Software makers said the problems are magnified at higher vehicle speeds. Most voice recognition systems currently claim accuracies of 90 to 95 percent, but some said such claims are averages, which hold true at 30 mph but not at higher speeds. At 70 mph, for example, the accuracy figure dips to about 70 percent, according to some engineers. If occupants crack open a window, turn on the radio, or blast the air conditioner, the accuracy figures drop even more.
"Even if you have 90 percent accuracy, one out of every 10 phone digits that you dictate are going to be wrong," said Jim Wargnier, vice president of engineering at Clarity and a former engineer at OnStar and at Delphi Automotive. "At 70 percent, it's going to be extremely frustrating for customers, even if they have a great user interface."
Some engineers disagree with the 70 percent accuracy figure, even for high-speed applications. It's greatly exaggerated and automotive engineers have found ways to deal with high speeds, they said.
"As cars go faster, wind noise rises, but any good speech recognizer changes itself to accommodate that," said Scott Pyles, director of product management for Lernout & Hauspie's automotive products.
Some engineers also said unexpected noises are of greater concern than high speed.
"The issue isn't steady-state noise," Chrumka said. "The big things that affect voice recognition are the variables -- kids in the backseat, windows opening and closing, pops and cracks in the cabin."
Automakers are concerned about even the subtlest lack of accuracy because it could place greater cognitive load on the driver, who theoretically should be free to concentrate on traffic and driving conditions.
"It should only take you so long to dial a cell phone, tune the radio, or turn on the air conditioner," Wargnier said. "There are a lot of legal ramifications for those companies if there are problems or if they place too much cognitive load on the driver."
Stories of drivers struggling with voice recognition systems are already commonplace, even though the technology has been available for only a short time. Such stories are a concern among industry analysts as well as automakers.
"If you want to change the radio station but you have to repeat the command 10 times in order to make it happen, that's a big problem," Gartner's Koslowski said. "Even though the system is voice-controlled, you still end up concentrating too much on changing the radio station, and that affects your driving."
Some believe the dilemmas facing automotive speech recognition may be a result of hardware rather than software.
"It may be a particular problem having to do with the processing power inside the car, as opposed to the speech technology," said Bill Meisel, president of TMA Associates, a speech industry marketing and consulting company.
If that's the case, the problem would be more focused on in-car PCs, such as the Infotainment system or ICES, Meisel said.
"Server-based systems processing voice over wireless connections would be less prone to problems because they can have as much memory and as much speed as they need," he said.
Systems such as OnStar's Virtual Advisor use off-board, server-based processing.
Loud and clear
Kurt Sievers, automotive marketing manager at Philips Semiconductors, said the company has had success running voice recognition on its Hello IC in moving vehicles.
"The usage scenario is difficult, but it has been done," Sievers said.
The company has demonstrated recognition over a 300-word vocabulary for command and control apps that can, for example, use voice to turn the volume on a radio up or down.
"We've had it on a test track at 120 kilometers an hour with the window open, and it still recognizes the driver's voice," Sievers said.
There are strategies to deal with voice recognition in a car with multiple passengers, said Corado Giorgetti, director of business development at ALST, an Israeli joint venture between Altec-Lansing and STMicroelectronics NV (stock: STM) that was created to develop speech DSP technology. For example, audio systems can be set up to "listen" preferentially to the person behind the steering wheel and treat other voices as "noise" to be canceled, he said. But it is not yet clear how successful such strategies are.
Separately, ST has developed a specialized 24-bit DSP-based chip called Euterpe that can perform the functions of voice recognition, text-to-speech rendition, noise and echo cancellation, and biometric verification on audio data streams. At present, the ST system is good for command and control, according to Paolo Gonella-Pacchiotti, car multimedia business unit director at the company.
"We are moving toward continuous speech recognition," Gonella-Pacchiotti said.
Some software makers, such as Clarity and Conversay, believe the solution lies in the use of specialized software and better microphones. Clarity, for example, offers a technology known as Clear Voice Capture, which extracts the voice signal of interest. The company said the technology provides an improvement over noise suppression systems, which have difficulty with signals that have components overlapping with voice signals.
Similarly, Conversay offers filtration techniques that separate speech signals from noise signals and narrowly focus on the speaker. The system employs two microphones -- one on the passenger side and another on the driver side -- and is focused more on distributed speech, for which processing power is split between the client and server.
Engineers are also reportedly looking at microphone technology as a way to boost accuracy. But the best, the so-called array microphones, cost between $100 and $180, and that's beyond the acceptable limit for automotive applications.
Many in the industry are unconvinced by automakers' claims.
"The reality is that today's systems are still failing in a lot of different modes," Clarity's Nussbaum said. "But the technology will get better before it reaches the market. Right now, we just don't know when that will be."
Peter Clarke contributed to this report. Related Stories: Two Auto Giants Forge Partnership To Develop Voice Technology
In-Vehicle PCs Face Bumpy Road techweb.com |