SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Technology Stocks : C-Cube -- Ignore unavailable to you. Want to Upgrade?


To: Don Dorsey who wrote (43156)7/18/1999 1:49:00 PM
From: John Rieman  Respond to of 50808
 
MPEG encoding................................

broadcastengineering.com

MPEG coding

By Michael Robin

In 1986 a study group called the Joint Photographic Experts Group (JPEG) was formed under the auspices of the International Standards Organization (ISO) and several related UN agencies. Their task was to work on the development of an international standard for the compression of still-frame images. The result was the JPEG standard, a compression technique applicable to single (stationary) pictures. Several mathematical techniques are used to reduce the information content by removing spatial redundancies, consequently reducing the bit rate requirements. JPEG uses a very popular spatial redundancy removal technique called Discrete Cosine Transform (DCT). A derivative of JPEG, called Motion-JPEG, allows the storage of video on computer disks for editing applications.

MPEG, which stands for Moving Picture Experts Group, goes beyond JPEG and applies temporal compression in addition to spatial compression. The initial version, now called MPEG-1, is used to encode low-resolution pictures to data rates of about 1.5Mb/s. MPEG-2 was developed for the delivery of compressed television for home entertainment. This set of compression and systemization algorithms and techniques has well-defined rules and guidelines. Because of this, it allows for variations in the values assigned to many of the parameters and provides for a broad range of products and interoperability. These definitions are integrated into an MPEG toolkit or syntax that addresses a variety of cost vs. performance standards described as levels and profiles (see Table 1). A further extension of MPEG-2, called the 4:2:2 profile, has been developed to record and transmit studio-quality video more efficiently than M-JPEG. This article is the first in a two-part series that looks at MPEG video compression (now commonly referred to as coding) concepts.

Coding video

The goal of video compression is to represent an image with as few bits as possible, while preserving an appropriate level of quality for a given application. Compression is achieved by removing the redundancies in the video signal. Lossless compression techniques lose no data. The compressed signal can be decompressed to obtain an exact duplicate of the original signal. However, lossless techniques allow only modest amounts of bit rate reduction, rarely exceeding 3:1. Lossy compression techniques are irreversible. They allow for higher bit rate reductions but result in distortions and artifacts. These can be made invisible to the eye but these changes to the original signal are permanent.

Compression systems work by eliminating redundancies in the data stream. Because redundant data need not be retransmitted, the result is a reduction of the necessary bit rate. Some of the redundant data is non-essential. One example of non-essential data is the area outside of the active picture area. CCIR 601 does not sample data in the vertical and horizontal blanking intervals. It relies instead on the transmission of EAV and SAV information to supplement sync data. Removing the horizontal and vertical interval allows for a bit rate reduction on the order of 55Mb/s without affecting the picture quality.

Spatial redundancies occur when, in large areas of the picture, adjacent pixels have nearly identical values. Temporal redundancies occur when consecutive pictures are similar. Compression systems work by separating the redundant (predictable) information, which does not need to be transmitted, from the unpredictable information (entropy), which needs to be transmitted. An ideal system would transmit only the entropy and reconstitute the redundant information from a reference picture.

In addition, the human vision system (HVS) creates what is known as perceptual redundancy. The result is reduced sensitivity to small picture details and to chroma details. All picture details invisible to (or unnoticed by) the eye can thus be removed.

MPEG tools

The MPEG specification is best described as a collection of bit rate reduction and compression tools. Among these tools are DCT, quantizing, run length coding (RLC), variable length coding (VLC) and a buffer for smoothing the changes in data rate.

Table 1. The MPEG-2 standard provides a variety of data rates. Parameters are specified based on specific levels and profiles. Pixel counts, sampling structures and maximum data rates are shown.

DCT is a lossless, reversible, mathematical process that converts spatial amplitude data into spatial frequency data. The image is divided in blocks of eight horizontal pixels by eight vertical pixels (8x8 block) of luminance (Y) and corresponding color difference (CB and CR) samples. Figure 1 shows how a television picture is divided into 8x8 blocks. A block of 8x8 pixels is transformed into a block of 8x8 coefficients describing the amplitude of a particular frequency. The upper left corner pixel represents the DC component. Moving across the top row the horizontal spatial frequency increases and moving down the left column the vertical spatial frequency increases. Essentially, the signal is converted into one value for the DC component and 63 values for 63 other frequencies, a process equivalent to a spectrum analysis.

The video signal has most of its energy concentrated at DC and the lower frequencies of the spectrum. The DCT process results in zero or low-level values for some or many of the higher spatial frequency coefficients. This process does not result in a bit rate reduction, but rather the opposite. Because the transform is the equivalent of a mathematical multiplication, it results in coefficients with a longer wordlength than the original pixel values. Many times, the result of transforming an eight-bit pixel block is an 11-bit pixel block. Despite this, the DCT process does convert the source pixels into a form that allows for easier compression.

Because a large number of coefficients resulting from the DCT process have zero or near zero values, they need not be transmitted. This results in considerable compression. The ignored coefficients represent non-discernible picture details, making this part of the compression essentially lossless. Higher compression factors require a reduction of the word length (number of bits per sample) of non-zero coefficients, which results in an inaccurate representation of the picture. The HVS is characterized by a reduced perception of fine picture details as well as of fine-grained noise. Fine picture detail, if present, tends to mask fine-grained noise whereas noise in uniform picture areas is highly visible.

Figure 1. As part of the compression process, images are broken down into 8x8 pixel blocks. A standard-definition picture consists of 720 luminance pixels by 486 active lines. A 720x480 block can be broken down into 5400 (90x60) 8x8 pixel blocks.

When analog signals are represented digitally, they are characterized by quantizing errors. These errors are essentially visible as noise. Long word lengths (a large number of bits per sample) result in low noise (high SNR). Short word lengths (a low number of bits per sample) result in an increased noise level (low SNR). The DCT process splits the signal into different frequencies and it thus becomes possible to control the noise spectrum for minimum visibility and maximum compression. The method used is to assign more bits to low-frequency coefficients and less bits to high-frequency coefficients by a process of weighting.

Figure 2. Two scanning sequences are used to place frequency coefficients into a serial datastream. The sequence on the left is used for progressively scanned images, while the sequence on the right is used for interlaced images. Both result in long strings of zeros which can be easily compressed.

In the process of weighting, DCT high-frequency coefficients are divided by a value n>1 and the result is rounded to the nearest integer. The value of n varies with the position of the coefficient in the block. The higher frequencies are assigned higher values. As a result, coefficients representing low spatial frequencies are quantized with relatively low steps and have a high SNR. The coefficients representing the higher spatial frequencies are quantized with large steps and suffer from distortion and low SNR. The weighting process is controlled by specific weighting tables and the decoder is supplied with information as to the weighting model used.

Run length coding (RLC) is a method of reading the coefficients in a particular order. The DCT transform of a non-interlaced picture, such as MPEG-1, results in significant coefficients located in the top left area of the block. Reading the values out of the memory in a 45° diagonal zig-zag as shown in Figure 2a, results in sending non-zero coefficients first followed by a long string of zero values. Figure 2b shows the manner in which the coefficient values are read out in the case of an interlaced picture. The RLC process efficiently encodes the sequence of DCT coefficients by sending a unique codeword in place of a long string of zeros resulting in even more data compression.

The DCT, requantizing and RLC processes result in certain coded values occurring more often than others, giving rise to a predominance of near-zero coefficient values. The variable length coding (VLC) process allocates short codewords to frequently occurring values (e.g. stationary picture or non-varying background) and long codewords to infrequently occurring values (e.g. varying or moving objects).

Compressed video information is inherently variable due to the varying content of successive video frames and results in a variable bit rate. The recording or transmission of data requires a constant bit rate, which is achieved by using a buffer. The input to the buffer is variable over time and the output is read out at a constant rate. To avoid overflow or underflow of the buffer, a rate control generated by the buffer adjusts the quantizer step size depending on the video content and activity. This results in a constant bit rate (CBR) but also a variable picture quality (VPQ).

With many of the basics covered, next month we will look at how the MPEG datastreams are assembled.