Digital Video Coding

The major objective of video coding is to represent an media with as few bits as possible while preserving the level of quality for given applications.

  • Reduction of storage requirement for data storage
  • Reduction of channel bandwidth required for data transmission

The Layered Architecture for Digital TV/OTT (Over-The-Top) Video

img

  • Production and Display Layer
  • Video/Audio Compression Layer
  • Data Transport Layer
  • Transmission Layer

Display Layer

Picture formats

  • Standard definition (720p)
    • Y: 1280×720
    • UV: 640×360 (4:2:0)
    • Application: Original digitization format used in SDTV
  • High definition 4:2:0 (1080p) or 4:2:2
    • Y: 1920×1080
    • UV: 960×540 (4:2:0)
    • Application: Original digitization format used in HDTV
  • Ultra 4K (2160p)
    • Y: 3840×2160
    • UV: 1920×1080 (4:2:0)
    • Application: in the fields of digital television and digital cinematography
  • Ultra 8K (4320p)
    • Y: 7680×4320
    • UV: 3840×2160 (4:2:0)
    • Application: NHK launches a three-year roadmap that entails the launch of 8K test broadcasting in 2016, with plans to roll out full 8K services by 2018, and in time for the 2020 Summer Olympics.

High-Definition Television (HDTV) Resolutions

img

Video/Audio Compression Layer

Video Compression Technologies

Digital video contains a tremendous amount of information

  • impractical to transmit over air or Internet or cost effective to store.
  • Every digital video coding system uses compression in order to reduce file size when transporting video over the network for storage and viewing, but at the same time, have as little an impact on the image and video quality as possible.
  • compression saves money.
  • Remove redundancy from data prior to transmission or storage
img

Often, the compression formats are referred to as “codecs.” Codec is the abbreviated form of “compressor-decompressor” or “coding-decoding.”

 Compression Ratio (CR)= size of source data  size of compressed bitstream \text { Compression Ratio }(C R)=\frac{\text { size of source data }}{\text { size of compressed bitstream }}

We want to not only consider Compression Ratio but also the quality.

  • Lossless compression (Lower compression ratio)
    • The aim is to reduce the number of bits used to represent the source information in such a way that there is no loss of information. (Recover the original representation perfectly)
    • Reversible
    • Desirable in Medical Imagery
  • Lossy compression (Higher compression ratio)
    • The aim is to reduce the number of bits used to represent the source information by introducing acceptable loss of information. (Recover the representation to be similar to the original one)
    • Irreversible
    • Desirable in TV Broadcasting

img

Lossless Compression - Statistical redundancy

Based on occuring probability

  • Any compression algorithm associated with text must be lossless
  • since the loss of just a single character could modify the meaning of a complete string.

Entropy encoding is lossless and independent of the type of information that is being compressed.

Make use of entropy.

Entropy (H)(H) of NN random variable sis_i , where i=1,2,...,N,i = 1,2,...,N, with corresponding probabilities pip_i 's is given by

H=i=1Npilog21pi=i=1Npilog2piH=\sum_{i=1}^{N} p_{i} \log _{2} \frac{1}{p_{i}}=-\sum_{i=1}^{N} p_{i} \log _{2} p_{i}

From video coding perspective, Entropy is a measure of

  • the amount of information associated with the set of coder input values and
  • Gives a lower bound on the average of bits required to code these inputs.
  • Entropy of source: the theoretical minimum average number of bits that are required to transmit a particular source stream.

Run-length encoding

  • Encode consecutive 0s and consecutive 1s

Example:

000001111111000011 => 5,7,4,2 => 101 111 100 010

Compression Ratio is this example is 18/12 = 1.5

Variable-length Codeword Assignment

Statistical encoding

  • Uses a set of variable-length codeword (VLC) with
    • Less bits assigned to more frequently occurring symbols
    • More bits assigned to symbols that appear rarely.

Why don’t use Uniform-length Codeword Assignment?

img

Although Uniform-length Codeword Assignment is the simplest and easy to implement, it is not optimal.

  • It is always the case that some symbols’ possibilities are more likely to be available than others
  • Codeword with different length is assigned to different symbol according to its probability of occurrence.

Major Criteria of VLC assignment

  • The assignment must be uniquely decodable
  • The representation should be optimal in terms of the required average bit rate

Huffman coding

Huffman coding is a technical codeword assignment of VLC.

  • A simple approach to obtain a coding near to its entropy

The first step is to create a series of source reduction

  • By ordering the probabilities of the symbols.
  • Combining the lowest probability symbols into a single symbol. This process is then repeated until a reduced source with two symbols is reached

Huffman encoding

Suppose we have a1 to a6 , we first sort them by Probability (From highest to lowest).

In each step, we combine the lowest probabilty with second lowest probability. Then sort it again

And then we start to assign codewords from shortest to the longest.

The assign procedure is like this:

  • First we have the general a643512a_{643512}
    • Break it into 2, Assign 1 to high probability and Assign 0 to low probability
      • If the 2 segments have the same probability (e.g. a4a_4 and a35a_{35} has 0.1, assign the more elements one with 1 (i.e. a35a_{35} is 1 , a4a_4 is 0))
  • If the element is not alone, further break it until it is alone.
    • The process end when all elements has a codeword assignment.
  • Stage 4 -> 3 -> 2 -> 1

Here the photo represented ABCDEF as a123456.

Note it is close to the entropy. Actually Huffman provide a more accurate result.

Huffman Decoding

  • Prefix Property: A shorter codeword will never form the start of a longer codeword.

  • The received bitstream can be decoded simply by carrying out a recursive search bit by bit until each valid codeword is found.

  • Use a State machine

    • Move 1 step forward when receiving a 0/1.
    • Output symbol X and reset the machine when state X is reached.

E.g. 101101010

Firstly move to first number

1 -> a2, then reset state

0 -> 01 -> 011 -> a1, then reset state

0 -> 01 -> 010 -> 0101 -> 01010 -> a3, then reset state.

Another illustration would be like this:

  • Carrying out a recursive search bit by bit until each valid codeword is found.

img

Adventages and Disadventages of Huffman Coding

Huffman code is one of the most efficient codeword assignment.

  • By providing average number of bits required per symbol that is closer to the entropy.

However:

  • It is difficult to construct the codebook when the number of inputs are large
  • Need to know the probability distribution of the input symbols ahead
  • Cannot adjust the codewords to handle non-stationary inputs

Lossy Compression - Spatial redundancy

Based on correlation of neighboring pixels

Spatial redundancy : exploited by image compression algorithms.

Image compression algorithm always consist of:

  • Quantization,
  • DCT-based coding,
  • Huffman coding and
  • run-length coding

In the following example will be using JPEG compression standard.

Lossy Image Encoder (JPEG)

img

Transformation

Transformation - Block Subdivision

Transformation: Block Subdivision itself is lossless.

Strong correlation between intensities of pixels that spatially close together within a frame

  • In transform image coding, an image is divided into a number of 2D blocks.
  • Each block is then coded separately.
  • Size of block depends on applications: typical block size: 8x8
img

Transformation - Discrete Cosine Transform

Transformation: DCT itself is lossless.

People say DCT-based Image Coding System is lossy because it contains Quantization. DCT itself is lossless.

2-D Discrete Cosine Transform the subdivided block from pixel domain to a frequency domain in which they can be more efficiently encoded (good energy compaction property).

img

Major objectives of transformation:

  • To make as many transform coefficients as possible small enough so that they are insignificant and need not be coded for transmission.

  • To minimize statistical dependence, i.e., correlation between the transform coefficients.

The relatively good capability for bit-rate reduction mainly comes from two mechanisms:

  • Not all transform coefficients need to be transmitted.
  • Coefficients need not be coded with full accuracy.

In nowadays transform image coding, Discrete Cosine Transform (DCT) is usually used.

Properties of transform coefficients:

  • Most of them have insignificant magnitude.
    • We only want to handle those values with significant magnitude.
  • They are highly uncorrelated.

2D-DCT Computation Features

  • DC coefficient
    • For u=v=0, the two cosine terms are both 1
    • F(0,0) is simply a function of the summation of all the values in the input matrix.
    • It is the mean of all 64 values in the matrix.
  • Others are known as AC coefficients
    • Most AC coefficients have zero or near-zero values

Quantization

Quantization itself is lossy.

Discarding information which is not visually significant

img

look at a Example of Quantization

img

  • If we increase the quantization size
    • the compression ratio (CR) will increase
    • the quality will decrease

So we need the help of transformation before quantization to further increase the compression ratio.

  • Lower frequency coefficients (small quantization step size)
    • finer quantization
  • High frequency coefficients (large quantization step size)
    • coarser quantization
    • becomes zero after quantization

img

Codeword Assignment (Entropy Coding)

Entropy Coding itself is lossless.

Comprise four steps

  • Vectoring (zig-zag scanning)
  • Differential coding
  • Run-length coding
  • Huffman coding

DC coefficients use Differential coding + Huffman coding

AC coefficients use Run-length coding + Huffman coding

Vectoring - Zig-Zag scanning

img

img

Place the lower frequency coefficients before higher one

  • to generate more consecutive zeros
  • permit the efficient use of the run-length coding

Coding DC coefficients - Differential Predictive Coding + Huffman coding

Differential coding: used when the amplitude of the values that make up the source information cover a large range but the difference between successive values is relatively small.

There is usually a strong correlation between dc coefficients of adjacent NxN blocks

  • The DC coefficients are encoded using the predictive coding techniques.

img

img

  • Difference values are further encoded using huffman coding

img

  • Size => the number of extra bits needed
  • Huffman codeword + Extra bits => transmission

Extra bit is encoded in a SIZE-bit plain binary code format (1’s complement if –ve).

That is the binary code of Difference Value.

Example:

  • Difference value = 14 -> 1110
  • Difference value = -14 -> 1’s complement of 1110 -> 0001
  • Difference value = 0 -> “-”

img

SIZE=log2(x+1)\operatorname{SIZE}=\lceil\log _{2}(|x|+1)\rceil

Where xx is the DC Difference value.

img

Why Differential Coding or Differential Pulse Code Modulation (DPCM) is used in JPEG coding, but not code the DC values separately?

Hints: Property of Huffman Coding

Coding AC coefficients - Run-length coding + Huffman coding

For AC coefficients:

  • Convert zig-zag scanned ac coefficients into an intermediate symbol sequence.
  • Encode the intermediate symbol sequence using Huffman (or arithmetic) coding.

Intermediate symbol conversion of AC Coefficients:

Each non-zero AC coefficient is represented by a pair of symbols:

  • symbol-1 (RUNLENGTH, SIZE) i.e. assign a Codeword
    • RUNLENGTH = the number of consecutive zero-valued ac coefficients preceding the non-zero ac coefficient \in [0,15]
    • SIZE = the minimum number of bits used to encode AMPLITUDE \in [0 to 10 bits]
      • the minimum number of bits required to represent the absolute value of the AC coefficient in binary format.
  • symbol-2 (AMPLITUDE)
    • AMPLITUDE = in the range of [-1023, 1024]

Example: if the zig-zag scanned sequence of ac coefficients is {0, 0, 0, 0, 0, 0, 476}

=> (6, 9)(476)

Where 6 is the consecutive count, and log2(476+1)=9log_2(476 + 1) = 9 (roundup).

To determine the SIZE value for an AC:

SIZE=log2(x+1)\operatorname{SIZE}=\lceil\log _{2}(|x|+1)\rceil

where xx is the non-0 AC coefficient

Because 0 is for DC.

Basically they share the same table

If RUNLENGTH is > 15:

  • use symbol-1 (15,0) to represent RUNLENGTH = 16

Example: if the sequence of ac coefficients is {0,0…0,12} where there are 39 0s,

=> (15,0)(15,0)(7,4)(12)

  • After coding the last non-0 ac coefficient in a block, use symbol-1 (0,0) => End Of Block (EOB)
    • EOB usually occurs well before the mid-point of the array!
    • EOB make effective use of the EOB symbol code
      • usually EOB occurs well before the mid-point of the array
      • EOB appears frequently, hence an efficient code of 4-bit is assigned (i.e. it is very probable)

Performance of Codec (CR)

Compression Ratio, CRCR

CR= Size of original data  Size of compressed data C R=\frac{\text { Size of original data }}{\text { Size of compressed data }}

Number of bits/pixel, NbN_b

Nb= encoded number of bits  number of pixels N_{b}=\frac{\text { encoded number of bits }}{\text { number of pixels }}

There is a trade-off between the compression ratio and the picture quality

Performance of Codec (Quality)

Peak Signal-to-Noise ratio (PSNR)

  • PSNR is the approximate to the human perception of the constuction quality

PSNR=10log[yx2552yx{f(x,y)f^(x,y}2]P S N R=10 \log \left[\frac{\sum_{y} \sum_{x} 255^{2}}{\sum_{y} \sum_{x}\left\{f(x, y)-\hat{f}(x, y\}^{2}\right.}\right]

Where yx{f(x,y)f^(x,y}2\sum_{y} \sum_{x}\left\{f(x, y)-\hat{f}(x, y\}^{2}\right. is the MSE.

Lossy Compression - Temporal redundancy

Motion JPEG (M-JPEG)

Motion JPEG (M-JPEG) is a digital video sequence represented as a series of JPEG images.

  • The higher the compression level, the lower the file size and image quality.
  • Advantages:
    • flexibility both in terms of quality and compression ratio.
    • No dependency between the frames, it is robust, meaning that if one frame is dropped during transmission, the rest of the video will be unaffected.
    • M-JPEG is popular in applications where individual frames in a video sequence are required (e.g., for video analytics)
  • Disadvantages:
    • No use of any video compression techniques => lower compression ratio (10:1 to 24:1) for video sequences.

Temporal redundancy is based on similarity of neighboring frames.

Consider:

  • An object moving across the screen in a movie
  • Last for at least 3s
  • Assume: frame rate 60 frame per second -> 180 frames
  • By sending only information relating to those segments of each frame that have movement associated with them
  • Considerable savings in bandwidth

img

Frame Difference Coding

The Difference frame is encoded as a Image using JPEG encoding method

and then decoded again and add back the t-1 frame to produce t frame.

img

Disadvantage of Frame Difference Coding

  • if there is a great motion between 2 frames, then the frame different signal become larger.
    • Compression ratio decreased

Motion-Compensated Prediction

Motion estimation and compensation

Encoder side

  • Do a motion estimation to predict a motion-compensated frame.
  • then use the motion-compensated frame to subtract from current frame t.
  • Encoder also transmitt the motion estimation to the decoder

Motion estimation is an expensive process. so encoder is more expensive but decoder is cheap. We only need decoder to watch the video.

Decoder side

  • Use the received motion information to create a motion-compensated frame.
  • then use the motion-compensated frame to add from current frame t.

If we cannot perfectly estimate the motion, prediction error and residue occur.

img

Advantage of Frame Motion-Compensated Prediction

  • Compression ratio increased a lot

The Video Encoder and Decoder

img

img

Motion compensation is performed in both the encoder and the decoder, but motion estimation is needed only in encoder => asymmetric property

Motion Estimation: Block Matching Algorithm

The macroblock (16x16) that gives the best match for a given matching criterion is chosen as the motion vector (mv).

img

Matching Criterion: Sum of Absolute Difference

img

Motion vector (mv) = (u,v)

Smaller SAD value = 2 frames are closer to each other.

SAD value = 0 means the 2 frames are identical.

  • Full searching algorithm for SAD
    • most accurate
    • heavy computation
  • Fast searching algorithm for SAD
    • reduced computation complexity
    • deterioration in picture quality

Video compression Picture Types

There are 3 types of picture used in digital video coding.

img
  • Intra (I) picture

  • without reference to any other pictures

    • provide access points to the coded sequence where decoding begin
    • coded with only moderate compression (10 to 24 :1)
  • Predicted § picture

    • reference to the I-pictures or P-pictures
    • generally used as a reference for further prediction
    • Compression ratio is about 20 to 50:1

If there is a error on a P picture, the error will propagate until the end.

e.g. IPPPPPPPPP

Or If we watch it in the middle, a P picture does not has a reference I frame and can not be reconstructed.

e.g. PPPPPPPP

This is undesirable. One solution is to do setup some access points I frames.

e.g. IPPPPIPPPPIPPPP

For each group of pictures between 2 I frames (IPPPP) we call it GOP (Group of Pictures).

Length of the GOP means how many pictures in the GOP. (Length is 5 in above example)

  • Bidirectional (B) picture
  • using motion compensated prediction from a past and a future I or P picture
    • never used as a reference or further prediction
  • highest compression ratio (50 to 100:1)
    • B-frame need to do the motion estimation twice
img

Performance comparing the use without B picture and with B picture

img img

The bitrate consumed by Bidirectional (B) picture should be smallest.

Coding of I-Picture

img

  • Intraframe coding
  • 8x8 DCT
  • Any weighting matrix for DCT coefficients possible
  • Differential coding of DC coefficient
  • Uniform qunatization
  • Run length coding of zeros with zig-zag scan
  • Entropy coding

Coding of P-Pictures

The IBBBPBBBI example is used here, so it is frame 5.

img

  • Motion compensated prediction from an I- or P-pictures
  • One motion vector per macroblock
  • Coding of prediction error with 8x8 DCT, uniform qunatization, zig-zag scan (like I-picture)

Coding of Motion Vectors

  • Highly correlated among neighboring macroblocks
  • In many standards, macroblock is the unit for motion estimation. It will be fine tuned to the block level if necessary
  • Code the difference using variable length code (VLC)

Coding of B-Pictures

The IBBBPBBBI example is used here again.

img

Transmission Order and Display Order

img

Before we encode B, note that B need to have a past and a future I or P picture.

  • So a past and a future I or P picture will be always encoded first, thus changing the transmission order
  • Therefore Display order is different from encoding order.

Another example:

An video surveillance system uses the frame sequence shown as below. Derive a suitable reordered sequence that ensures firstly, only two frames must be stored in the decoder, and secondly, the required I and/or P-frames are available to decode each P- and B-frame as they are received.

img

Answer: IPBBPBBBPBBIBBPBB ----

Compression Latency

When compression takes place, one or several mathematical algorithms are used to remove image data.

  • This process requires a certain amount of time, and the resulting delay is known as compression latency.
  • The more advanced the compression algorithm, the higher the latency, given the same processing power.
  • Latency of B-frame > P-frame > I-frame
    • CR of B-frame > P-frame > I-frame
    • Complexity of B-frame > P-frame > I-frame
    • B-frame need to do the motion estimation twice
  • The latency of a network, however, must be considered when designing a video broadcast system.
  • For some applications, such as the compression of studio movies, compression latency is irrelevant because the video is not viewed live.
  • In live broadcast applications, low latency is essential.

The H.264 Video Coding Standard

  • Multi-mode, multi-reference Motion Estimation
  • 4X4 block integer transform

img

  • Intra-frame Prediction

Variable Block Size Motion Estimation and Motion Compensation

The size of moving/stationary objects is variable, hence there is a trade off in terms of the size of a block.

In H.264, the arrangement is flexible, each 16×16 macroblock may be:

  • Whole block
  • Two sub-blocks (16×8 or 8×16) vertically
  • 4 sub-blocks (8×8 each)
  • In the last case, the 4 sub-blocks may be divided once more into 2 smaller blocks, hence the size could be 4×4.
img
  • Use sub-block will make the motion prediction more precise

The encoder selects the “best” partition size for each part of the frame, to minimize the coded residual and motion vectors.

The macroblock partitions chosen for each area are shown superimposed on the residual frame.

  • little change between the frames (residual appears grey) ® a 16x16 partition is chosen
  • detailed motion (residual appears black or white) ® smaller partitions are more efficient.
img

The 4X4 Integer Transform

img

Summary of H.264

  • Video coding is based on hybrid video coding and similar in spirit to other standards but with important differences
  • New key features are:
    • Enhanced motion compensation
    • Small blocks for transform coding
    • Enhanced entropy coding
  • Substantial bit-rate savings (up to 50%) relative to other standards for the same quality
  • Enhancement on perceptive quality seems better than that on PSNR
  • The complexity of the encoder triples that of the prior ones
  • The complexity of the decoder doubles that of the prior ones

Rate-Distortion (RD) Theory

addresses the problem of determining the minimal number of bits per symbol, as measured by the
rate R, that should be communicated over a channel, so that the source (input signal) can be approximately reconstructed at the receiver (output signal) without exceeding a given distortion D.

RD curve

  • Only use 4 points, pointing the PSNR and bitrate at different Quantization step size
  • We can compare the performance between 2 encoders
img img
  • Higher Quantization step size, smaller PSNR, smaller bitrate

We should select encoding depends on application.

Fast Motion Estimation

Revisit the Block Matching Motion Estimation

  • Motion Estimation is the critical step to remove temporal redundancy

Operations per SAD (Full Search Algorithm)

  • || = 256 (16x16 pixels)
  • -= 256
  • += 255
  • Total operations = 256 + 256 + 255 = 767
img

Operations per second (Full Search Algorithm)

img
  • Horizonal and Vertical both need to search 2D+12D+1
  • Number of SAD to be computed per MacroBlock = (2D+1)2(2D+1)^2
  • Number of operations per MacroBlock = (2D+1)2×767(2D+1)^2 \times 767
  • Assume frame rate = 30 fps, Number of operations per second = 30×W16×H16×767(2D+1)230 \times \frac{W}{16} \times \frac{H}{16} \times 767(2D+1)^2

The value of D is depends on applications

Fast Searching Algorithm: n-Step Hierachical Search (n-SHS)

  • Instead of searching all, only a limited locations are searched
  • the minimum SAD is the Motion Vector

First We have a search window

Search Window Size S=D2\text{Search Window Size } S = \lceil\frac{D}{2}\rceil

So we will search the 9 points:

(0,0),(0,S),(0,S),(S,S),(S,0),(S,S),(S,S),(S,0),(S,S)(0,0), (0,S),(0,-S), (S,S), (S,0), (S,-S), (-S,S), (-S,0), (-S,-S)

img

After finding the point with minimum SAD,

Second Search Window Size S2=S2\text{Second Search Window Size } S_2 = \lceil\frac{S}{2}\rceil

Then we use that point with minimum SAD as center to find another 9 points. (including itself)

Let the top right point has the minimum SAD.

img

After finding the point with minimum SAD,

Third Search Window Size S3=S22\text{Third Search Window Size } S_3 = \lceil\frac{S_2}{2}\rceil

Then we use that point with minimum SAD as center to find another 9 points. (including itself)

Let the top point has the minimum SAD.

img

If the minimum SAD lie on the same point also the Search size is 1, it converges and we need not to proceed.

Then that is the motion vector.

Note this algorithm has saved a lot of operations.

Global and Local Minima

3-step search can be trapped in the local minima.

  • Local minima of the distortion function giving prediction images with higher error than full search
  • So, many research works have been carried out to find alternative approaches for block matching motion estimation.

img

Example of FSA and n-SHS

img

img

img

img

D = 6, number of search for FSA = 2D + 1 = 13

img

D = 6

img

H.264/AVC

Revisit H.264

  • H.264/AVC is a standard for video compression, and is equivalent to MPEG-4 Part 10, or MPEG-4 AVC (for Advanced Video Coding)

Selected new features in H.264/AVC

  • Enhanced motion estimation and compensation (ME and MC) with variable block size
  • Multiple reference frames
  • Multi-mode intra-prediction (Intra Predition)
  • Integer block transform
  • De-blocking filter

Multiple Reference Frames

img

Multi-picture motion compensation using previously-encoded pictures as references allows up to 32 reference pictures to be used in some cases

  • Very significant bit rate reduction for scenes with
    • rapid repetitive flashing
    • back-and-forth scene cuts
    • uncovered background areas
  • Every MacroBlock partition can have a different reference picture that is more appropriate for that particular block.

Drawbacks of Multiple Reference frames

  • Both the encoder and decoder have to store the reference frames used for Inter-frame prediction in a multi-frame buffer.
  • Dramatically increases the computational complexity of the encoders because the Motion Estimation (ME) process needs to be performed for each of the reference frames

Spatial model: Intra prediction

The coding of an I-frame in MPEG-1 and MPEG-2 is very similar to that of the JPEG. The coding is mainly done by DCT

  • Motivation: Intra-frames are natural images, so they exhibit strong spatial correlation.
img

Many different approches for intra prediction has been proposed.

img

The H.264 has two prediction types: I4MB and I16MB

I4MB (Intra_44 type): 9 modes

Intra prediction in H.264/AVC is always conducted in the spatial domain.

  • The current macroblock/block is predicted by the adjacent pixels in the upper and the left macroblocks/blocks that were decoded earlier.

img

img

I16MB (Intra_1616 type): 4 modes

img

  • Mode 0: vertical
  • Mode 1: horizontal
  • Mode 2: DC
  • Mode 3: plane (a linear “plane” function is fitted to the upper and left-hand samples ® in areas of smoothly-varying luminance)

Rate-Distortion Optimization (RDO)

The SAD (Sum-of-Absolute Difference) is one of the simplest matrices to select the most similar block from the reference frame.

  • It is always good to find a block in the reference frame with the minimum distortion.

  • However when the distortion is optimized, the bit-rate (R) of coding the the block might be high:

    • Encode more motion vectors for small block size in H.264
    • Index of multiple reference frames
  • Increase the size of memory for storage or require high channel capacity for transmission.

  • We can formulate these two factors by optimizing a RDO (RateDistortion Optimization) cost function.

  • However, the distortion (D) term and the rate ® term are two totally different matters and they cannot be added directly.

  • One solution is to formulate a weighting factor 入 which is able to bring these two terms in the same entity for the sake of comparision:

J=D+λR\boldsymbol{J}=\boldsymbol{D}+\lambda \boldsymbol{R}

where J is the RDO cost and is normally called the Lagrangian multiplier. In the literature, J is also referred to as the Lagrangian rate-distortion function.