The next step in digital video?
Vision Of Tomorrow -- Seeing Is Computing Aug. 10, 1998 (Computer Reseller News - CMP via COMTEX) -- Thanks to voice-recognition software, computers are beginning to talk and listen. But they have, to date, remained primarily blind. That, however, is going to change as researchers improve the way computers recognize image patterns. In fact, analysts and researchers said that within the next five to seven years, personal computers will add another sense-the ability to "see." There are several ways computers will first start to put their newfound abilities to use, said Aaron Bobick, associate professor of computational vision at the MIT Media Laboratory, Cambridge, Mass. Some will be based on computer vision capabilities of today, which already are starting to play a role in controlled industrial environments and high-end modeling for medical applications, he said. A large part of the problem in getting computers to see is not how much computational power is available, said Matthew Turk of Redmond, Wash.-based Microsoft Corp.'s Microsoft Research arm. Rather, it is knowing how to frame the problem of "seeing" in a way the computer can understand and process, so data flows through its camera, or eyes, Turk said. "There has been a whole vision community out there for 30 years scratching their heads wondering how to give computers vision. We are certainly making progress, but we're not quite there yet," he said. "The biggest hurdle is knowing how to construct the problem. It's not a bandwidth or computational problem. Sometimes when you get faster computers you just get to scratch your head faster." Still, even though computers will not soon be able to interpret classical art or the ballet, they already are getting pretty good at recognizing football plays, said industry observers. "If the New England Patriots run a play, the system can actually start to tell what play they ran," said MIT's Bobick. Trying to get the computer to interpret a raised eyebrow or one of the other thousand of small and subtle facial movements people make, however, is a different ball game. "Many people wonder why, if we can get the computer to read football plays, we can't get the system to read facial expressions. Well, there are an awful lot of things that have to happen in each given play, and there are a very small number of plays, maybe 100," said Bobick. "This gives us a lot of evidence for each play so we can build local relationships between players. Such as, if the center hands off to the quarterback and there is a certain block, and through seeing this information the computer is capable of deducting reliable answers as to what play just occurred," he added. In fact, entertainment could be the first area to benefit from computers' upcoming improved visual capabilities, said research experts. For example, Media Lab conducted KidsRoom, a project where the researcher used computer vision as input for the narration of an interactive story. As children moved throughout a room, the story reacted in realtime based on what it "saw." "These kinds of things allow you to have real interactive experiences in a large scale. So instead of point and click on a screen, it's a room-size scale and here computer vision will provide an opportunity for entertainment. At first, it's going to be at the large-scale entertainment centers built by the likes of Universal Studios, Disney and Sony; places willing to put in the effort and time needed to build something no one has done before," said Bobick. Another up-and-coming market is security. Although systems will be unable to tell if something wrong or criminal is happening at a mall, for example, they will be able to look at crowded areas and tell if things do not seem normal, experts said. Systems would then notify a security department that could send a guard to investigate. Vendors and VARs will build vision-equipped systems for banks, automated-teller machine environments and the government to help reduce some types of crime, said both Bobick and Turk. For example, an ATM could be equipped to see that the person who entered the ATM card is not the rightful account holder, experts said. Computers are still a long way from meeting the challenge of reading expressions and facial movements. But organizations are making progress, research executives said. "We are working on the goal of being able to see where a user is looking on the screen, so the computer can prepare in advance, even a couple of milliseconds," said Turk. In the future, the graphical user interface could see that users are about to print when they look at the printer icon for a certain length of time, he said. Integrators can see early inklings of this bleeding-edge research today. Microsoft Research recently made available version 1.0 of its Vision Software Development Kit, a low-level language library of object definitions and related software for use with Visual C++ to help developers and researchers create image-processing and acquisition of live elements. George V. Hulme is a freelance business and technology writer based in Croydon, Pa. -0- By: George V. Hulme Copyright 1998 CMP Media Inc. |