IT WAS YOUR TYPICAL AIR TRAVEL nightmare. Glen Harvell, an Andersen Consulting manager, arrived at the Minneapolis airport to find his flight to D.C. had been delayed by mechanical problems. But like most peripatetic business types, Harvell is adept at dealing with airport chaos. While other passengers mulled around in confusion, Harvell took out his mobile phone and called his company's automated ""speech recognition'' travel system. He quickly blazed through the computerized prompts and booked himself on another flight--without having to talk to a human being or punch in an annoying series of phone codes.
Until recently, speech recognition was used mostly as a substitute for typing on home computers. Speak into a microphone, and your words (or something close to them) magically appear in your word processor. But now the technology has improved to the point where businesses, primarily in the financial-service and travel industries, are turning to speech systems designed for use over the telephone. Speech promises to vanquish those frustrating touch-tone mazes, reduce the costs of providing 24-hour customer service and, for the rest of us, change the way we interact with everything from PCs to home appliances. ""This could grow like the Internet, or even faster, because it doesn't require people in their homes to buy or learn anything new,'' says William Meisel, editor of the Speech Recognition Update newsletter.
Speech technology still has a lot to prove. Though the PC dictation systems have sold well, analysts estimate that about half of buyers stop using them after the first week: people are simply too used to typing. On the other hand, punching stock symbols and flight numbers onto crowded 12-button telephone keypads can drive even the most nimble-fingered road warriors to distraction. Silicon Valley's Nuance Communications, Boston-based SpeechWorks and old-timers IBM and Lucent Technologies are all competing to supply companies with speech-based alternatives. ""People are expecting 24-hour service, and call centers are scrambling to figure out how to provide that in a cost-effective way,'' says Val Matula, a speech-technology planner at Lucent.
The financial-services industry was the first to try speech in a bid to make itself available any time, from anywhere. Its customers want instant access to commoditized info like stock prices and the ability to move money, even when they're away from their Web browsers. Discount brokers E*Trade and Schwab started using voice-based telephone systems over a year ago, and now full-service brokerages are jumping in. Fidelity introduced a system last December that lets users say things like ""get my account balance'' or ""sell all my Microsoft stock''--or any variation on the phrasing of those requests. Such ""natural language understanding'' works by breaking up all words into the component sounds called phonemes--the ""aaas'' and ""chs''--and calculating the most probable meaning. If you make a mistake, or say something the computer can't understand, it simply responds with longer and more specific instructions (for example, a yes-or-no question). ""Once you are familiar with our system, the call can go very quickly,'' says Fidelity VP Judith McMichael.
Emboldened by the improving technology, the travel industry is following the financial brokerages. United and American Airlines are preparing to bring systems to the general public this year that give flight and gate information by voice request. But the Minneapolis-based Via World Network has gone further: its automated voice line, currently licensed to Northwest Airlines and Andersen Consulting, was the one Glen Harvell used to hop flights. The great advantage of the new speech technology, says Harvell, is that you can ""barge in'' whenever the computer is speaking and move onto your next request. ""I'd never be that rude to a reservation agent,'' he says.
Voice could really break into the mainstream this spring. The New York-based MovieFone, which was bought earlier this month by America Online, intends to fulfill Cosmo Kramer's infamous request, ""Why don't you just tell me the name of the movie you want to see?'' Starting in New York, and then Los Angeles and Chicago, filmgoers can speak not only the movie's name but their ZIP code and credit-card number. Hedging its bets, the company will keep the old touch-tone system in the background.
Of course, speech technology isn't frustration free. The computer often tries to parse a cough or a loud breath. And accents can be tricky, though speech companies are constantly reviewing calls for different pronunciations. Another problem is that some jobs are inevitably being lost to speech systems. According to the Department of Labor, there were 159,000 telephone operators employed in the United States last year, about two thirds of the 1990 total. Though speech companies and their customers insist the new technology simply frees up workers for more valuable tasks, unions say companies are sacrificing jobs and the quality of customer service. ""I've never heard anyone say, "I hate to call that office because I get a human being,' but I hear the opposite all the time,'' says Kevin Kistler of the Office and Professional Employees International Union.
Still, it's hard to see how the inexorable spread of voice technology can be stopped. This was evident at last week's Toy Fair in New York City. At least half a dozen toys featured some degree of voice recognition. Among the offerings was a pigtailed doll called ""My Best Friend,'' which can ask math and spelling questions and judge whether a child's answer is correct. The toy has many unlikely cousins, which are all testing the public's desire for speech-enabled gadgetry. The new Jaguar S-type sedan, out this May, will come equipped with voice-activated climate, audio and telephone controls (no voice-operated steering yet). A new cordless phone from the Fort Worth, Texas-based Uniden lets you dial by saying the names associated with numbers you store in its memory.
And there's more to come. Bill Gates has promised that within 10 years we'll control the personal computer with voice commands. Industry execs predict that ever speedier and cheaper silicon chips will make possible speech-enabled microwaves and VCRs within the next five years. It's a far cry from changing plane tickets on your mobile phone. But if speech recognition can lick the complexity of the PC and VCR, then it will certainly have people talking.
Father: "CAR. . .START UP. . .OPEN WINDOW"
Car: "WHERE ARE WE GOING? YOU NEED GAS"
Father: "PHONE. . .CAN I BOOK A FLIGHT TO PITTSBURGH. . .TUESDAY MORNING?"
Phone: "SEARCHING. . . . . .FIRST CLASS, OR COACH?"
Oven: "ALERT. . .YOUR CAKE IS BURNING!!"
Mother: "OVEN. . .TURN OFF NOW! TV. . .WHAT'S ON CHANNEL 3 AT 8:00?"
TV: " 'LOOK WHO'S TALKING' (PG)"
Daughter: "MONKEY. . .LET'S PLAY!"
Toy Monkey: "ALRIGHT! WHAT IS YOUR PASSWORD?"
Son: "COMPUTER. . .TAKE A LETTER"
Computer: "O.K. 'KIDDO' BUSINESS OR PERSONAL?
HOW SPEECH RECOGNITION WORKS Common sound patterns, or phonemes, are the backbone of the technology. Every word, phrase or sentence--no matter how complex--is a sequence of phonemes. Here's how MovieFone customers will use it:
1 After being prompted, the moviegoer says the name of the movie she wants to see.
2 Her sounds are converted to numeric values for comparison to phonemes.
3 Phonemes for each title are scored according to how well they match the spoken words.
4 The title with the highest score (here a 35) is used to retrieve movie information.