Video processing..................................
techweb.com
November 15, 1999, Issue: 1087 Section: SYSTEM DESIGN -- FOCUS: Advanced Video Processing -------------------------------------------------------------------------------- Net products accelerate design cycle Steve Morton, President and Chief Executive Officer, Oxford Micro Devices Inc., Monroe, Conn.
It has been only a few years since the electronics industry has been forced to grapple with the problem of providing products for a rapidly evolving Internet. Before that, it was enough simply to design a chip to implement a standard for, say, MPEG-2 compression. Now, however, it might take years to complete a standard-plus the time it takes to implement a chip.
But the industry is running on Internet time, not ASIC time, so the only way to respond to the need for rapid change is to have programmable solutions. The problem then becomes how to provide a platform that is powerful enough to handle the volume of data generated by multimedia systems and yet is still flexible enough to be programmed easily.
There is an insatiable demand for better-quality images, smaller file sizes, faster response, smaller size, lower power dissipation, more flexibility and, of course, lower cost. People do not want to wait for their fingerprint to be verified or their digital camera to process the last image before it becomes ready for the next one.
Also complicating the picture is the increasing demand for convergence. Rather than having a desk full of individual products, each dedicated to a specific task, it is now feasible to combine functions. The popular product last year was TVs combined with Windows computers and Internet access. Now low-cost Web cameras, still cameras and speech-recognition systems are being added. On the horizon are fingerprint-recognition systems.
With state-of-the-art technology it is possible to merge these systems, but it requires sophisticated software and powerful hardware. ASIC technology simply is not flexible enough to handle this demand. It may be possible to implement a palette of operations efficiently in an ASIC, but that solution would only be one fixed point in a constantly changing mix of standards.
Doing convergent design economically in a small form factor is where the crunch comes. Currently available processors such as high-performance DSPs or microprocessors, while offering performance and the flexibility of programmability, have been perfected in different applications environments.
The problem is no longer how to squeeze a bunch of multipliers into a chip. With today's densities of 50,000 and more gates per square millimeter, the problem is one of organization. How to organize this hardware so that it can easily be used, and used efficiently, becomes the design challenge.
The addition by Intel Corp. (Santa Clara, Calif.) of multimedia instructions to its processors, or by Motorola Inc. (Schaumburg, Ill.) of Altivec extensions to the PowerPC do amplify the serial-processing power of these general-purpose processors. The added functionality sits on top of highly complex architectures that have evolved over decades.
Consequently, there is a cost in both dollars and watts to support these large-scale designs due to the need for vendors of general microprocessors to address a large market segment outside the specific applications addressed by the extended instruction sets. Moreover, at the basic architectural level other problems inevitably appear-such as nonaligned operands.
Although Intel's MMX can in principle handle 8 bytes or eight pixels at one time, those instructions were overlaid on a floating-point processor, where a floating-point operand was always aligned on an 8-byte boundary. When an algorithm takes a sum of products off a moving window, seven times out of eight those 8 bytes will not be aligned on a boundary. A performance hit seven times out of eight for functions that are being performed in massive parallelism is huge.
DSP chips are more specialized on a specific applications area-communications-but the basic operations they target are essentially serial. Two of the highest performers, the Sharc architecture from Analog Devices Inc. (Norwood, Mass.) and the C60 DSP family from Texas Instruments Inc. (Dallas), have impressive clock rates and high throughput. However, they only process one pixel at a time.
The C60 is actually a parallel processor with a very long instruction word (VLIW) compiler. But there is no way to take advantage of the inherent parallelism in an image. The C60 can process two pixels simultaneously, but typically uses its parallelism to speed up nonobvious, dissimilar parallel operations such as computing a loop count and an address at the same time. The architecture is specifically being sold into communications applications like cellular basestations where there is no inherent parallelism in the data.
On the other hand, completely different solutions such as image-compression chips from Zoran Corp. (Santa Clara, Calif.) or from C-Cube Microsystems Inc. (San Jose, Calif.) are written around some standard. They can do a really good job of implementing those standards assuming that the standard has been fully specified. Quite often the process is at the draft stage and it takes a long time to firm it up.
And, the Internet-driven need to provide the best-quality images with the fewest bytes pushes designers to invent new algorithms for which there are no standards. Such new image formats can be used effectively if they are hidden from users who only see the output from a server.
In addition, there is much more to applications than just compression. One may want to build a smart Web camera that watches the images it captures and only compresses and transmits data that meet some criteria. This would be particularly appropriate for surveillance cameras that currently spend most of their time recording a static scene where nothing of interest is occurring. And, one may want to use algorithms that can make good use of the raw image data as it comes directly from a low-cost CMOS image sensor, rather than converting that into a standard video format .
Thus, it appears that there is a need for a programmable processor that fits into this spectrum of solutions that can tackle image processing and manipulation in a general enough context while still maintaining performance and cost-effectiveness. That was the perception at Oxford Micro Devices when we decided to address the image-processing segment of the industry. The strategy we came up with was to look at the basic operations that need to be performed when images are manipulated or processed and then build the processor architecture around those operations.
Building blocks
That led to a small number of basic image-processing building blocks for which the Ax36 family of image processors is optimized. To maximize flexibility and efficiency, the building blocks were implemented at the instruction set level rather than by having dedicated, special-purpose coprocessors built into the hardware. Also, the software tools were co-designed with the hardware to make the processors easy to program efficiently.
For example the architecture solves the parallel-operand alignment problem. A specialized data cache and crossbar interconnect structure allows the programmer to slide a window of multiple pixels byte by byte through memory at full CPU speed. There are no byte boundaries to complicate algorithms and slow down the process. Such a basic operation is always going to be needed, and conventional architectures do not address the problem directly.
Another important consideration is the handling of mixed data types. These occur, for example, when multiplying 8-bit, unsigned pixels by 16-bit signed coefficients, which is a basic filtering and compression operation.
Taking a problem such as video compression, the basic technique is to do motion estimation so that only the novel information-what has changed in the scene-is stored. Motion estimation requires huge numbers of complex, best-match comparisons, typically on 16 x 16 blocks of pixels. If the architecture is built around this type of operation, it is feasible, since many billions of operations per second are required. But with other architectures this task becomes an enormous challenge, because they were not designed with that job in mind.
Although high-end processors may not be appropriate models for this type of application, there are low-cost microprocessors that do many of the basic operations that we do not need to address with a dedicated video DSP. We expect to see our image processor alongside a microprocessor because it is going to run a Web browser or handle the zillion da-ta formats needed for embedded audio and all sorts of embedded functions.
The Oxford Ax36 can be dedicated to thinking about images continuously. It's not going to be burdened by updating the filesystem by grabbing 17 different URLs off the Internet. So it can be used in a targeted embedded setting where there truly is no other processor or micro-controller around, or it can work alongside existing high-end processors to offload computationally intensive tasks.
Once a designer is confident about getting on top of existing convergence demands it would be nice to start adding new abilities to information-processing products that give them a whole new qualitative feel. Very flexible visual-input devices are a potential new market area here.
Instead of being tied to keyboard and mouse, or indeed to a touch screen system, why not stand several feet away and point toward the screen? So there is another level of visual input where our chip can be programmed to recognize a finger or hand as a pointer.
That way a menu of items on the screen can be controlled purely visually.
IBM Corp. (Armonk, N.Y.) has done some work along those lines. And it could be extended to new types of advertising displays where customers walk into the grocery store where up fairly high would be a little flat-panel display with a low-cost camera that can acquire hand gestures. The customer could be presented with a menu of what is on sale and actually drive through the menu just by pointing.
That same smart camera could double as a surveillance camera and could also be used to address the need of advertisers to know whether or not an ad is effective. A smart camera could be recording how many people watch each ad and for how long.
Thus, a whole realm of new input devices becomes possible when there is one hardware platform that is not just the processor but is also the sensor.
Even the characteristics of the lens become programmable when advanced image-processing algorithms are factored into the design. It all becomes programmable if the computational horsepower is available at reasonable cost and with low power dissipation. Image processing algorithms have advanced to the stage where basic physical characteristics, such as the depth of field of the lens or the apparent light sensitivity of the image sensor, can be improved in software. It all becomes programmable if the computa-tional horsepower is available at reasonable cost and with low power dissipation.
The resulting "software" camera is thus able to reduce all of the physical and cost factors that make it difficult to add real-time image capture and processing functions out there in the real world. In addition, it can be programmed remotely via the Internet to watch the scenes it captures and react when it sees something interesting, and to implement new and better compression algorithms as they come along.
Copyright © 1999 CMP Media Inc. |