SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : Identix (IDNX) -- Ignore unavailable to you. Want to Upgrade?


To: steve who wrote (19711)1/17/2001 4:57:25 AM
From: steve  Read Replies (1) | Respond to of 26039
 
Voice recognition improves, but suffers from
selective hearing

January 16, 2001
Web posted at: 11:32 AM EST (1632 GMT)

In this story:

New devices, new applications

Context is key

A Web of difficulties

RELATED STORIES, SITES

NEW YORK (AP) -- In the movies, computers are always good listeners. In real
life, they only hear what they want to hear.

Much as people would like to speak with their machines, to browse the Internet
by voice rather than keystroke, recent strides in speech recognition technology
hardly provide the ease and spontaneity of a free-flowing dialogue between
humans.

Instead, the machines monopolize the discussion.

While the latest speech engines can recognize spoken words with better than 90
percent accuracy, a vast improvement from only five years ago, the machines
dictate the specific words and phrases that users can say. They ignore any
commands that stray.

New devices, new applications

Still, even with a scripted dialogue, the allure of "voice browsing" is strong,
especially for those trying to stay connected on mobile phones and handheld
computers with tiny keypads and screens. For drivers, the attraction is even
greater.

In less than three months, more than 200,000 of America Online's members have
signed up for AOLbyPhone, one of several new "voice portals" whose recorded
voices read small nuggets of online information to callers in response to set
spoken commands. AOL is the parent company of CNN.com.

Palm computer users are also showing interest.

In a survey of Palm users, "about 36 percent acknowledged that they used the
Palm and the cell phone as they were driving," said Tom O'Gara, chief executive
of MobileAria, a voice portal for cars due to be launched in June by Palm and
Delphi Automotive Systems.

"We suspect the number is higher," he added.

Although speech technology players
such as IBM, Nuance Communications
and SpeechWorks International are
working to develop more conversational
systems, none expect a major
breakthrough any time soon.

"Natural language understanding is one
of those Holy Grail areas," said Bill
DeStefanis, senior director of product
management for Lernout & Hauspie, a
leader in dictation software and
text-to-speech technologies, which are
used to make computers read aloud.

The main obstacle for natural speech technology involves simple brute power.

While huge gains in processing speeds have helped computers tackle the listening
part of the equation and understand specific words, it requires a lot more
firepower to make that machine comprehend the countless combinations of
words used to express thoughts.

Such constraints are less of a hindrance with computer dictation software,
where the main goal is to recognize spoken words and transform them into type.

Context is key

"You can try to anticipate how many ways somebody would recite a request and
hard code them into the software, but inevitably you're going to miss some," said
DeStefanis at L&H, which is currently pitching a new speech system for
navigating handheld computers -- a much more manageable task than trying to
master the entire dictionary. "If you limit the domain that you have to
understand, you can increase accuracy."

That's why the most effective use of voice-activation technology has been with
automated telephone systems that provide customer service for businesses like
airlines and credit card companies. Because they are usually designed for a
specific purpose, those speech engines can be customized to understand a more
select vocabulary, even if those words are spoken in different combinations.

"Context is very important," said Steve Ehrlich, vice president of marketing at
Nuance, which has designed systems for the brokers Charles Schwab and
Fidelity Investments. "At Schwab, the need for speech system that can recognize
`I need a pizza' is not crucial." The approach is similar at most of the new
telephone-based voice portals, which also include HeyAnita, BeVocal, Tellme and
Virtual Advisor, a driver-oriented service just launched in certain markets by
OnStar.

While the exact focus and presentation varies, all the services keep it simple,
limiting their scope to specific matters such as news, weather, e-mail, driving
directions and movie listings.

"The coolest thing is being able to get my e-mail read to me, like I'm the queen,"
said Jen Bekman, 31, an AOLbyPhone user in New York who develops Web
content for streaming video.

Users may struggle until they grow familiar with the "approved" commands for
navigating the different menus, but the speech recognition on these services is
rather impressive.

Tellme and BeVocal, for example, sparkled when put to the test with a series of
potential tongue-twisters delivered by cell phone, ably identifying the spoken
names of esoteric New York towns like Mamaroneck and Wurtsboro.

A Web of difficulties

Michael Lambert, a 26-year-old librarian and BeVocal user in Foster City,
California, occasionally struggled with the voice recognition at first, but says the
accuracy has improved to the point where there are no problems. "In the
beginning, I used to wonder what's going on. I'm from South Carolina, so I
wondered if it was an accent thing," said Lambert, who finds the service
especially useful in his car. "I just moved here in May, so it's very useful to just
pick up a cellphone in the car and get directions."

By contrast, most attempts to voice-navigate the full-blown Internet on a
personal computer are fraught with frustrations. Because each Web site is
designed differently, it's difficult to pack that much flexibility into a single
program on a PC.

A more realistic way to "voice-enable" the Internet, said Ira Brodsky, an industry
analyst for Datacomm Research, might be to customize each Web site with the
appropriate vocabulary.

"All you would have to do with a typical PC is have a microphone," said
Brodsky. "It also puts the Web site in a position to use personalization technology
and come to recognize your voice. It may identify your voice and remember
what did the last time you were there."

The most prominent voice browsers for PCs, including Conversay and Ivan,
employ many of the same basic techniques to roam Web sites, numbering the
various links on a Web page so a person can "click" the number verbally. But the
makers of Ivan, One Voice Technologies, also decided to take a crack at natural
language, weaving some "artificial intelligence" into their browser so it can
recognize concepts in addition to specific commands.

However, all those bells and whistles make Ivan a bloated, sluggish program, a
problem Conversay avoids. Either way, both frequently make navigation
mistakes.

Despite the current focus on speech-activated systems, the industry isn't shying
away from more ambitious projects.

Nuance, for example, recently partnered with Ask Jeeves, the Internet search
engine that invites users to type in requests with whole sentences rather than a
series of keywords.

"We'll be trying to leverage the natural language data that they capture on their
Web sites with all these people typing in questions, and build 'speech models'
based on that data," said Ehrlich at Nuance.

cnn.com

steve