Wednesday, May 4, 2011


Because images/video packets or frames are dominated by a mixture of stationary low frequency backgrounds and transient high-frequency edges, a wavelet transform is very efficient in capturing the bulk of image energy in a fraction of coefficients to facilitate compression. Wavelet-based image compression techniques such as zerotree or EBCOT produce excellent scalability features and rate-distortion (R-D) performance for robust multimedia transmission over wireless channels. The wavelet decomposition is illustrated in Figure 1, where energy concentration in the low-frequency bands facilitates the construction of embedded code streams. The embedded nature of the compressed code stream provides the basis for scalable video/image coding by fine-tuning the R-D trade-off. The source encoding can be stopped as soon as a target bit-rate is met, or the decoding process can be stopped at any low desirable bit-rate by truncating the code stream. Typically, the code stream is composed of different quality layers in descending order with base layer providing the rough image and enhancement layers providing quality refinement. Different layers in the code streams have significant different perception importance to the end users. Losing the base layer may cause serious distortion for reconstructed pictures, while losing quality enhancement layers can still achieve acceptable picture qualities.

Figure 1: Original lena image (128*128 pixels, 8 bpp), wavelet decomposition and reconstruction.


MPEG-4 introduced in 1998 was designated by the ISO/IEC MPEG under the formal standard ISO/IEC 14496, which was primarily aimed to low bit-rate video applications over networks. The layer-based quality enhancement concept has been widely applied to scalable video coding (SVC) in MPEG-4 Part 10 H.264/advanced video coding (AVC) and JPEG2000 progressive image coding standards. The source coding bit-rate variety advantage has laid foundations of a number of emerging multimedia applications over bandwidth limited wireless networks such as IPTV, video on demand, online video gaming, etc. Both SVC in MPEG-4 and quality progression in JPEG2000 provide considerable advantage for error-robust multimedia streaming over time-varying wireless channels especially in mobile environments (e.g., mobile WiMAX networks). Without losing generality we use MPEG-4 video coding in this section and JPEG2000 in Section 1 as multimedia coding examples.
The video sources are coded into a couple of quality layers via SVC, starting with the rough pictures in low bit-rates followed by higher layers refinement data for quality enhancement in higher bit-rates. The rough pictures in base layers are much more important in terms of perception than the refinement data in enhancement layers, which deserve more protection upon transmission in wireless channels; the refinement data in enhancement layers can be discarded during transmission when bandwidth is limited. For each wireless mobile terminal in the mobile WiMAX networks, for example a moving vehicle, the available transmission bandwidth resource is fluctuating due to different locations of the vehicle and the corresponding path losses as well as the channel errors. With SVC applied to each video stream on each wireless terminal, the actual traffic pumped into WiMAX networks from application layer can be adaptively controlled while keeping a rate-distortion optimize manner: if the available bandwidth is low due to high channel error probability, only the base layers of the rough pictures will be transmitted; when the channel condition becomes better with higher available bandwidth, both base layers and refinement layers will be transmitted to improve the perception quality.
Another important factor in multimedia streaming is the inter-packet dependency. It is typical that the dependency graph of multimedia packets is composed of packetized group of pictures, and the multimedia packet coding is correlated involving complex dependency among those packets. If a set of image/video packets are received, only the packets whose ancestors have all been received can be decoded. Figure 2 illustrates the typical code stream dependency for layer-based embedded media stream. The inter-packet dependency provides opportunities for resource allocation and adaptation for multimedia streaming over WiMAX, where the packets with more descendents are much more important than those descendents. For example, for the layered-dependent media packets in the Figure 2, each packet is associated with a distortion reduction value denoting the quality gain if this packet is successfully received and decoded. If the packets in layer 2 can make contribution to the decoded media, all the packets in layer 0 and layer 1 must be received and decoded successfully; otherwise the packets in layer 2 are useless in terms of decoding even if they are transmitted without a single bit error.

Layer Based Multimedia Decoding:
Step 1:
           CumulativeSuccess=TRUE; iteration=0;
Step 2:
           While ( iteration < number of layers ) {
                     Decode the layer (denoted by iteration).
                     if (decoding successfully)
                               Then CumulativeSuccess=TRUE;
                               Else CumulativeSuccess=FALSE; break;
Step 3:
           Output the decoded stream up to layer iteration.

IBP Based Video Decoding:
Step 1:
           Decode I frame. If fail, return;
Step 2:
           Find and decode the next P frame.
           If fail, go to Step 4; Otherwise pPFrame = the found P frame.
Step 3:
           While ( pPFrame != NULL ) {
                     Decode the B frames ahead of pPFrame;
                     Find and decode the next P frame.
                     If decoding fail, break;
                     if ( the current P frame == the last P frame ) pPFrame = NULL; Else pPFrame = pPFrame ->Next;
Step 4:
           Output the decoded stream.

Figure 2: Typical packet dependency of layer based embedded media stream structure.
Based on the analysis of unequal importance and inter-packet dependency, the UEP-based resource allocation strategies can be generalized for image/video streaming over wireless channels: network resource allocation and adaptation are applied to the media streaming according to the distortion reduction (importance) of each packet and the inter-packet dependency. The ancestor packets with more dependent children packets are protected with more network resources including stronger FEC capability, robust modulation schemes, and higher ARQ retry limits, etc. the descendent packets with less dependent children packets are less protected to save communication resources.


Besides the layer-based quality scalability, wavelet-based image compressions also produce shape and position information of the regions or the objects in the picture, as well as the lighting magnitude value information describing those regions or objects. Without losing generality, we use wavelet-based image coding as an example for multimedia content. The shapes or regions of the objects in the picture are much more important than the lighting value magnitudes of these objects. Errors in shape and region information lead to high distortion of reconstructed images, while errors in pixel magnitudes are more tolerable during transmission and decoding. This is because the shape and region information impacts the magnitude value information associated with those regions when the image is percept by end users. Furthermore, the shape and region information can be desirably translated to position information segments (e.g., p-segment) and the lightening magnitude information can be translated into value information segments (e.g., v-segment) by wavelet-based progressive compression in each quality layer. The p-segments denote how small-magnitude wavelet coefficients or insignificant wavelet coefficients are clustered, while the v-segments denote how large-magnitude wavelet coefficients are valued. Layers in the code stream represent the quality improvement manner, while the p-segments and v-segments in each layer represent the data dependency. The p-segments and v-segments can be easily identified from zerotree-based or EBCOT-based code-streams. The final code stream structure is composed of p-segments and v-segments in decreasing importance order as shown in Figure 3.

Figure 2: Code stream format for scalable quality layers and position-value separation in each layer.
Now we see how to separate p-segments and v-segments in each quality layer. The zerotree-based compression techniques generally involve dominant coding pass to extract the tree structures, as well as the subdominant pass to refine the leaves on the tree. These coding passes are invoked layer by layer in a bit-plane progressive way. In significant pass of each zerotree bit-plane coding loop, a half decreasing threshold δ is specified. A wavelet coefficient is encoded as positive or negative significant pixel if its magnitude is over δ. The positive or negative nature is determined according to the sign of that coefficient. A coefficient may be encoded as a zerotree root if its magnitude and all the descendents’ magnitudes are all below δ, and itself is not a descendent of previous tree root. Otherwise this wavelet coefficient is encoded as an isolated zero. Because all of these positive or negative significant symbols, isolated zero and tree root symbols contain tree structure information, they are put to the p-segment of the current bit-plane layer. Then subdominant pass is invoked for magnitude refinement. The magnitude bit of each positive or negative significant symbol is determined according to the threshold δ, and is put to the v-segment of that bit-plane layer. Thus, p-segment and v-segment are formed layer by layer with the half decreasing threshold δ. Because p-segments contain zerotree structures and v-segments contain magnitude values, incorrect symbols in p-segments cause future bits to be mis-interpreted in decoding process while incorrect bits in v-segments are tolerable for errors.
The EBCOT-based JPEG2000 is a two-tiered wavelet coder, where embedded block coding resides in tier-1 and rate-distortion optimization resides in tier-2. Without losing generality, we only discuss the coding process in one code block (tier-1), because the p-segments and v-segments separation interacting with context formation (CF) and arithmetic coding (AC) resides in tier-1 intra code block coding, and the p-segments and v-segments in all other code blocks can be separated in the same way. Unlike zerotree compressions’ coefficient by coefficient coding, JPEG2000 tier-1 coder processes each code block bit-plane by bit-plane from the most significant bit (MSB) to the least significant bit (LSB) after quantization. The p-segments and v-segments are also formed bit-plane by bit-plane in an embedded manner. In each bit-plane coding loop, each quantization sample bit is scanned and encoded in one of the significant propagation pass, magnitude refinement pass, and cleanup pass. In significant propagation pass, if a sample bit is insignificant (“0” bit in the current bit-plane) but has at least one immediate significant context neighbor (at lease a “1” bit occurs in the current/previous bit-plane), the zero coding (ZC) and sign coding (SC) coding primitives are invoked according to one of the 19 contexts defined in JPEG2000. The output codeword of this sample after ZC and SC are put to p-segment in this bit-plane, because the coded sample determines the positions of neighboring significant coefficients. In other words, it determines the structure of the code stream. The p-segment in this bit-plane is partly formed after significant propagation pass. In the following magnitude refinement pass, the significant sample bits are scanned and processed. If a sample is already significant, magnitude refinement (MR) primitive is invoked according to the context of eight immediate neighbors’ significant states. The codeword after MR is put to v-segment in this bit-plane because it denotes magnitude information only, containing no position information of the significant samples. After magnitude refinement pass, v-segment in this bit-plane is completely formed. In the final cleanup pass, all the uncoded samples in the first two passes are coded by invoking ZC and run-length coding (RLC) primitives according to each sample’s context. The code words after cleanup pass are put to the p-segment in this bit-plane, because the positions of significant samples are determined by how insignificant wavelet coefficients are clustered. Till now p-segment in this bit-plane is also formed. Then the following bit-planes are scanned, and all the p-segments and v-segments are formed bit-plane by bit-plane.
The UEP-based resource allocation with position-value enhancement is similar to layer based resource allocation where packets in base layers are more reliably protected than packets in quality refinement layers. Different from layer-based UEP, position packets are protected more reliably than value packets in each quality layer. Then the network resource allocation strategy is optimized according to the distortion reduction of each packet and the calculated dependency graph among these packets. Some position packets may have low values of distortion reduction but a lot of descendents, and these packets will be protected more effectively; the value packets with low distortion reduction and few descendents will be less reliably protected to save communication resources.
Related Posts with Thumbnails