Cube will demostrate Software decode on interactive games at COMDEX..................................
November 17, 1997, Issue: 981 Section: Multimedia Design
------------------------------------------------------------------------
Right media mix feeds consumer needs
By Hemant Bheda, Vice President of Engineering, Mediamatics, Fremont, Calif.
The personal computer is increasingly becoming an integral part of the home. More and more, PC technology is evolving from engineering and business environments and into greater numbers of living rooms.
However, in this escalating market, the biggest requirement placed on this brand of PC is for cost-effective delivery of consumer-quality video. Currently, MPEG-2 video, which has four times the MPEG-1 resolution, falls short when implemented in software. Even with multimedia extension (MMX) instruction sets, MPEG-2 software decoders still do not provide the proper levels of video performance. As the industry moves from today's MPEG-2 technology to advanced television or High Definition TV (HDTV), video resolution requirements will be six times those of MPEG-2. As a result, cost-effective software-only implementation of HDTV will be nearly impossible.
Today, many system designers and OEM engineering managers operate under the false assumption that 30 frames/second is the acceptable video quality for these emerging consumer electronics systems and entertainment PCs. But in actuality, consumer-quality video can not be defined as just full-motion 30 frames/s video.
On the contrary, there is a host of important factors that define it. First of all, consumer-quality video requires the delivery of sustained and uniform 30 frames/s video. Secondly, it requires tear-free delivery, which means that the displayed video frame is a partial frame update. This results when a display update is performed in the middle of the video sync and the two are not synchronized. This particular video flaw often occurs in a PC during software-only decode.
Thirdly, consumer-quality video involves elimination of so-called interlaced video artifacts. Since most MPEG-2 content is generated for TV displays, it is interlaced. This means that each of the 30 video frames consists of two fields, which are 16 ms apart. When the two fields are displayed together as a single frame on a PC (which uses progressive display), one can see an annoying combing effect or serrated motion that is created during that 16-ms interval. Hence, TV-interlaced-video content must be converted to progressive format to eliminate the video artifact.
Lastly, consumer-quality video requires spatial resampling to maintain the correct aspect ratio for display. While TV video pixels are rectangular, PC VGA pixels are square. Therefore, the correct aspect ratio must be factored in to display decoded video on a PC.
The main issue that system engineers face in designing for consumer and entertainment PCs is the ability to cost-effectively achieve consumer-quality video. Current software-only MPEG-2 designs, even those using a 200-MHz MMX CPU, can only produce 20 to 22 frames/s.
Moreover, today's graphics controllers cannot handle TV's interlaced video. As a result, interlaced video artifacts, nonuniform video delivery, video tearing and poor scaling quality occur.
A key consideration here is that the system bottlenecks associated with trying to achieve consumer-quality video involves both compute-bandwidth and memory-bandwidth limitations. To resolve these issues, systems designers must utilize balanced hardware and software partitioning. By moving the memory-intensive portions of the decoding functions into hardware, the memory bandwidth portion of the system bottleneck is alleviated. This allows the necessary headroom to process a sustained 30 frames/s.
One possible implementation of this balanced hardware/software partitioning involves moving the motion compensation part of the MPEG-decoding algorithm into the graphics controller. This eliminates the CPU-to-systems memory bandwidth bottleneck.
High-quality deinterlacing, scaling and sub-picture alpha blending are also implemented in graphics controllers as part of this balanced hardware/software partitioning. By implementing these functions in silicon, MPEG-2 accelerated graphics controllers are created.
There are several major considerations involved in moving software functions into hardware. Above all, balanced partitioning must be viewed as a system-level architecture rather than an assortment of changes. Involved here are such considerations as the optimal transfer of data over the PCI bus or advanced graphics port (AGP); minimizing the amount of data that is copied and reducing the total amount of memory required; ensuring full concurrency between hardware and software; delivering hardware-quality tear-free video, and ensuring the hardware/software partitioning works seamlessly with such Microsoft-defined application programming interfaces (APIs) as DirectDraw and DirectShow arithmetical.
The Motion Video Collaborative Compression Architecture (MVCCA) developed by Mediamatics is an example of such a system-level architecture. Incoming MPEG-2/DVD packetized data consists of audio, video, sub-picture graphics and control information. It first goes through the splitter, demux and depacketization. Control data goes to the navigation manager for user interface purposes.
Audio data goes through a software-based audio decoder. In some cases, if a six-channel source is involved and the PC has only two speakers, then the software converts the data from six channels to two by utilizing the 3-D audio algorithm. As shown, the video decoder is divided into hardware and software with hardware performing motion compensation. The sub-picture graphics decoding is performed in software. However, graphics information-blending involving sub-picture and video decoding is performed in hardware by the graphics controller. The output is then sent to the video-rendering process.
In the video subsection of this architecture, there are different elements associated with decoding the compressed video into a video frame. Here, variable length decoding (VLD), inverse quantization (IQ) and inverse discrete cosine transform (IDCT) are all performed in software. These functions are compute bound rather than memory bound; therefore, the time required to perform these processes scales linearly with the processing speed of the CPU.
Take, for example, a comparison between a 133-MHz MMX CPU and a 266-MHz MMX one. The time required to perform compute-intensive processes (such as VLD, IQ and IDCT) on the 266-MHz CPU would be about half of that required on the 133 MHz version. Thus, performance benefits for these types of functions are directly influenced by increased CPU speeds.
However, video-decoding functions such as motion compensation and block reconstruction are memory bound. Due to existing cache-subsystem design, these functions inefficiently use available system memory bandwidth. This problem is worsened by the inability of memory bandwidth to scale linearly with CPU speeds.
Sub-picture decoding is also best performed in software. Output of the decoded video frames is blended with the sub-picture data using the sub-picture blender, which is in hardware. That video data then goes through the process of color-space conversion (CSC), high-quality scaling and deinterlacing before it goes to the display.
As a result, this new system architecture defines a new breed of MPEG-2 hardware-based accelerated graphics controllers. These new controllers now include MPEG motion compensation, VGA/2-D/3-D graphics functions, CSC and sub-picture blend. The motion-compensation function interfaces to a PCI bus and to a memory-management unit (MMU) block, which is part of the graphics controller.
PCI bandwidth
Also, MVCCA uses less bandwidth than what the completely software-based video decoding uses. Maximum PCI bandwidth is 16.5 Mbytes/s; average PCI bandwidth is less than 10 Mbytes/s; and system memory usage is 1.5 Mbytes.
So far as performance is concerned, without an MPEG-2 accelerated graphics controller and a 200-MHz MMX CPU, a PC can only produce 22 frames/s and leaves no CPU power in reserve. But with an MPEG-2 accelerated graphics controller, video decoding is a full 30 frames/s with 10 percent CPU power leftover. With a 266-MHz MMX CPU, the nonaccelerated graphics controller approach yields 25 frames/s and, once again, leaves no spare CPU power. However, MVCCA used in a similar setup takes video decoding to full 30 frames/s, this time leaving 25 percent of the CPU power to spare.
Use of a balanced system architecture not only delivers consumer-quality video, but also frees system resources so a variety of other applications can run concurrently.
Copyright (c) 1997 CMP Media Inc.
New Search | Search the Web
You can reach this article directly here: techweb.com |