Hey, Ali --
Let me ask you, have you ever typed Navier-Stokes equations in spherical coordinates into a computer using an ascii keyboard and standard font? It is possible, fortunately, to create a very large computer or mental dictionary of more or less arbitary signs and to map them into phonetic words. Weirdly enough, there are parts of the brain that hold a collection of signs that the brain associates both with word and with specific real world objects (such as pig or screwdriver). Not yet clear whether all of these objects are the same across gender within culture or across cultures, or, certainly, over time. Thus it is possible to speak in English of the "integral of this from here to there" and the "partial of this with respect to that." -- it gets very boring, but even this is remarkably easier than using a Chinese typewriter. Few would try to memorize the sounds 20,000 or more hanzi without using a phonetic strategy (and the phonetic elements of each character). The problem of mapping speech into an arbitrary set of signs in a computer -- letters, hanzi or mathematical equations is much simpler than teaching a high school student calculus or how to write a decent English essay. And the computers and software get better and the students seem to get worse. We all appreciate the pioneering work of the Great Danish laureate Victor Borge in inventing "visual punctuation" which made Spoken Danglish so easy to understand. "Ladies and chentlemen, sput, I am here this efening ..." The use of "sput" for the comma, and "sput-sput" for double quotes was inspired, but "s-s-s-pew-sput" for the question mark was sheer genius. Today, of course we do not have to go such explicit detail to be understood y our computer. The computer, bless its heart, can learn our unique style of rising inflection in the ultima instead of insisting on a s-s-s-pew-sput which, in time, can even short out a keyboard.
I think the computer does just about as well today at transcribing speech and mapping it into a standard written language as a human court reporter (some of whom not only stenotype, but record and computer interpret to produce a near error free transcript. I believe that soon this will be overkill, and both parties will stipulate computer generated transcript (possibly backed up by audio or video tape). Computer generated text attempts to record the words not the sounds of the spoken input. It misses the nonverbal cues one generates while speaking ("let the record show that the deponent sneered or snickered"). Since commands to the computer are totally words or symbols (or names of symbols) it is duck soup to train the computer to interpret a wide variety of sequences of phonemes into (largely) unambiguous sequences of (computer) words. Going the other way -- from the computer text to spoken English or Chinese is no harder, but it will be a gas to put in the appropriate emotional content (John. Marcia. John? Marcia? John! Marcia-a-a). Until the computer learns to code affect into the computer interpretation, we will be unable to record or transmit it. Thus the ambiguities of email exchanges, the misunderstandings (complicated by inadequate or failing intelligence) of debates on the net, and the unexplained swerve of the robot automobile into oncoming traffic (I LEFT my heart in San Francisco -- hummed the drowsing driver shortly before his death). Let's bet. I say that in 5 years most users will speak more into the computer than we type. This includes remote entry over our portable intelligent phones and written voicemail. Winner takes the loser to a live poetry recitation in the language of his choice. |