CANDICE BERGEN MAKES IT LOOK SIMPLE. In the Sprint Foncard ads, she merely tells the telephone to "Call home" without ever lifting a manicured finger. Major computer manufacturers are also touting the power of speech. Apple and Compaq, for example, sell machines that can respond to brief orders, like "Computer off" or "Open file." This is just baby talk compared with what the future could hold: computers that take fast dictation, a microchip in the car dashboard that answers when you ask directions, a calendar you can order to remind you to send your spouse flowers on your anniversary.
Voice recognition, long a staple of science fiction, is rapidly moving into the real world. To be sure, there are still a lot of bugs to work out (one Foncard user recently tried to trick Sprint's system by asking it to "Call Penguin"; the computer connected him to his office). But it's already clear that voice technology will ultimately change the way people use machines. The major attraction is ease of use, especially for technophobes intimidated by keyboards, mice and a profusion of icons littering the screen. Another force pushing development is the popularity of subnotebook computers with ever-smaller keyboards. "No one wants to enter text by toothpick," says Janet Baker, president of Dragon Systems of Newton, Mass., which makes speech-recognition software. The explosion of computer-related hand and wrist ailments has also propelled development. If you're talking, your fingers do less walking.
Voice recognition may eventually make life easier, but the technology behind it is complex. Sound is broken up into a range of frequencies, which the computer analyzes and records as a series of numbers. The result is a digital "picture" of the sound that the computer compares with a library of word pictures.
The accuracy of a particular program depends largely on how big that library is and how many words the computer has to recognize. Sprint's program, introduced last month, works reasonably well because there's a limited library, just a few words. Callers dial an 800 number and give the computer at the other end a password, usually a social-security number. The computer makes a voiceprint and compares it with a prestored password, just like the signature on file at the bank. If the prints match, the computer lets you ask for one of 10 preset numbers.
Sprint has been developing the system since 1988, when company security officials noticed an alarming increase in "shoulder-surfers" who prowled airport phone banks with binoculars, looking to write down calling-card codes as they were punched in by unsuspecting travelers. Sprint claims that the Foncard is 95 percent accurate and much more secure than a regular calling card.
More ambitious voice-recognition programs, such as dictation, require computers to use artificial intelligence. The machine has to understand context to distinguish among words like two, too and to. And the library of word pictures has to be much larger and geared to a user's requirements. Given these limitations, "today's technology is not as fast as a good typist," Baker says, although she expects great strides in i the next five years.
Outside the office, voice recognition is especially useful when your hands are busy with other tasks, like driving, says Chris Schmandt, a principal research scientist at MIT's Media Lab. Instead of fussing with the radio dials, you could just turn off Howard Stern with a simple command. Voice recognition can also eliminate the need for key pads on car phones. "We're going to be using our mouths and our ears more," predicts Schmandt. So forget all that stuff in the manual. Can we talk?
Automated voice-recognition systems promise the most seamless way for humans to communicate with machines. How do computers hear what we want? Here's a look:
Speaking into a microphone or telephone, a person tells the computer to carry out an action, using a limited, predetermined vocabulary to give instructions.
The computer breaks the sound of the voice into syllables and then records the instructions as a pattern of high and lows spread across a spectrum of frequencies.
Once the sound is disassembled, it is stored in the computer memory as an electronic map of the original command, gathered together as a binary collection of zeroes and ones.
A database scanning program looks through a library of word maps until it finds one that matches the originaL Once the match is made, the computer has "recognized" the command.
If there is more than one word with the same sound ("to" and "two," for example) the computer uses artificial intelligence to figure out from the context which word is right.
Once the computer decides what the word means, it takes the order from the user and acts on it, without any strikes on the keyboard.