Digital Image Compression and Coding Standards

The major objective of image coding is to represent an image with as few bits as possible while preserving the level of quality and intelligibility required for the given application.

Image coding exploits the spatial redundancy in the image.

Applications:

Reduction of channel bandwidth required for image transmission.
- e.g. Video phone, Fax machine, Broadcast TV, HDTV, remote sensing via satellite, videoconferencing, etc.
Reduction of storage requirement required for image storage.
- e.g. Storage of image data, graphic database, weather map, medical images, etc.

Criteria of quality (To measure to quality of the image):

Objective criteria - MSE(Mean Square Error) etc
Subjective criteria - Sharpness etc.

Quality of image is given with repect to the human observation. Therefore it is a more subjective stuff.

Image compression system model

Source coding

Compress data.
Remove redundancy from data prior to transmission.
e.g. Transform coding etc.

Channel coding

Telecommunication people worry about this. we won’t go into this.

Allow detection or correction of transmission error.
Add redundancy to data.
e.g. HDLC etc.

Classification of image coding

Error-free compression:

Will not loss any information, we just reduce the redundancy.

Bit-plane coding
JBIG coding (a from of arithmetic coding developed by IBM)
Run-length coding

Lossy compression:

Some less important information will be lost.

Predictive coding
Transform coding (JPEG)
Subband coding
Pyramid coding
Vector quantization coding
Block truncation coding
Fractal coding
Wavelet coding (JPEG2000)

Now we delve into the Source Coding.

Image source coding model

In the Source Coding Stage (We don’t look the Channel coding stage):

The encoder compress the image, then the decoder reconstruct the image.

Compression state: Mapper -> Quantizer -> Symbol Encoder
Decompression state: Symbol Decoder -> Inverse Mapper

Mapper

Maps the input set of data (pixel intensity) into another set of data which is easier to process.

To save space

Examples :

We scan line by line, count the numbers of White pixels and Black pixels.

{number of white pixels, number of black pixels} : Always starting with the right pixel.
E.g. {3, 4}, {4,1} means 3 white pixel, 4 black pixel. Then 4 white pixels, 1 black pixel. (look at the first line.)
EOL means End of Line. If the remaining are all WHITE pixels, we cant just use EOL for notation.
In the last line, there are 3 black pixels at first so it is {0,3}. 0 white 3 black.

DCT is like DFT but we only keep the real part and dump all the imaginary part.

In the DCT domain, we keep the low frequency coefficent
normally high frequency coefficient can be ignored

Quantizer

Quantizes the mapper output to a limited number of possible values.

E.g. Convert Float values back to Integer After Transform coding

Examples :

In the previous example, We used a Transform coding to MAP the integer values 0 - 255 become Floating point values.
- Since floating point values are not friendly, we quantize the coefficients
- i.e. limit the number of possible level of the output back to 0 - 255 so 8 bit is enough to store each pixel.
- some information is lost

Originally, each pixel will occupy 24 bits because RGB (full color).
We reduce the total colors, by selecting only 8 bits to store some of the colors (based on MSE criterion).
- Colors are selected through Clustering (K-Nearest Neighbor) and pick the mean
The Selected colors form a palette (can be less than 8 bit)
- The palette will act as look up table
Some information is lost

Coder

Performs codeword assignment to the quantizer output.

If the combination occur more often, a shorter codeword is assigned
If the combination occur less often, a longer codeword is assigned
Codeword assign is based on probability
The Codewords are always unique
- EOL can also be assigned with a Codeword if it is frequently used

This kind of Probability based coding is actually called Entropy Coding.

Entropy Coding

Based on Probability.

Uniform-length codeword assignment
- Each pixel use the same number of bits to represent
  - e.g. 8 bit 1 index
- The average bit rate is the same as the bit rate
simple and easy-implemented
- Not very efficient
Variable-length codeword assignment
- need to know the Entropy concept
- Coding Method includes:
  - Huffman coding
  - Arithmetic coding
  - Shift code

Entropy

Entropy is a measure of the amount of information associated with the set of coder input values and gives a lower bound on the average number of bits required to code these inputs.

Entropy, $H$ , of $N$ random variables $W_i$ , where $i = 1,2...N$ , with the corresponding probabilities $P_i$ ’s is given by:

$H=-\sum_{i=1}^{N} P_{i} \log _{2} P_{i}$

$P_i$ $P_{i}$ is the probability of $W_i$ $W_{i}$
- $W_i$ can be something like {2,0} , {3,0} depends on what kind of symbol you want to transmit.
- E.g. Transmit alphabet letters(i.e. abcdaae…) , $N = 26$ and each alphabet variable is a $W$ .
Note since $P_i$ is always $0 \leq P_i \leq 1$ , $log_2P_i$ is negative. To make the value back to positive, a “-” is added.

Example:

Suppose probabilities of $W_1, W_2, W_3, W_4, W_5, W_6$ are $0.4, 0.3, 0.1, 0.1, 0.06, 0.04$ respectively.

Then the entropy of these 6 inputs:

$N = 6$

$H=-\sum_{i=1}^{6} P_{i} \log _{2} P_{i}$

$H=- (0.4 \log _{2} 0.4 + 0.3 \log _{2} 0.3 + 0.1 \log _{2} 0.1 + 0.1 \log _{2} 0.1 + 0.06 \log _{2} 0.06 + 0.04 \log _{2} 0.04)$

$H = 2.143534$ bits

That means on averages, we need to use a 2.14 bit to send each symbol.

It is just a theoretical result.

In entropy coding, shorter codewords are assigned to more probable messages while longer codewords are assigned to less probable messages.

If the probability of that message is high, a shorter codeword is assigned
If the probability of that message is low, a longer codeword is assigned

Huffman coding

Using Huffman coding we can approximate how many bits for each variable.

Huffman encoding

Suppose we have a1 to a6 , we first sort them by Probability (From highest to lowest).

In each step, we combine the lowest probabilty with second lowest probability. Then sort it again

And then we start to assign codewords from shortest to the longest.

The assign procedure is like this:

First we have the general $a_{643512}$ $a_{643512}$
- Break it into 2, Assign 1 to high probability and Assign 0 to low probability
  - If the 2 segments have the same probability (e.g. $a_4$ and $a_{35}$ has 0.1, assign the more elements one with 1 (i.e. $a_{35}$ is 1 , $a_4$ is 0))
If the element is not alone, further break it until it is alone.
- The process end when all elements has a codeword assignment.
Stage 4 -> 3 -> 2 -> 1

Here the photo represented ABCDEF as a123456.

Note it is close to the entropy. Actually Huffman provide a more accurate result.

To decode Huffman code:

Use a State machine
- Move 1 step forward when receiving a 0/1.
- Output symbol X and reset the machine when state X is reached.

E.g. 101101010

Firstly move to first number

1 -> a2, then reset state

0 -> 01 -> 011 -> a1, then reset state

0 -> 01 -> 010 -> 0101 -> 01010 -> a3, then reset state.

Adventages and Disadventages of Huffman Coding

Huffman code is one of the most efficient codeword assignment.

Huffman encoding is also used in JPEG compression.

However:

It is difficult to construct the codebook when the number of inputs are large
Need to know the probability distribution of the input symbols ahead
Cannot adjust the codewords to handle non-stationary inputs

Arithmetic coding

In arithmetic coding, a message is represented by an interval of real numbers between 0 and 1.
It is special because we don’t assign codewords to it.

Arithmetic coding

Message: $a_1 a_2 a_3 a_3 a_4$

Each coming symbol narrows the range by which the message on hand is represented.
The final message symbol narrows the range to [0.06752, 0.0688)
- any number within this subinterval (e.g. 0.068) can be used to represent the message.
For higher efficiency, one should use the value having the shortest binary form in the range [0.06752, 0.0688).
- How to Convert a decimal fraction to binary
  - Decimal/Binary Converter
- 0.06752 = 0.000100010XXXX…
- 0.0688 = 0.000100011XXXX…

Therefore the shortest binary form is 0.000100011 -> 9 bits to encode the message.

To decode the Arithmetic coding:

E.g. Decode 0.000100011 (that is 0.068)

0.068 is between 0 ~ 0.2 -> a1

0.068 is between 0.04 ~ 0.08 -> a2

0.068 is between 0.056 ~ 0.072 -> a3

0.068 is between 0.0624 ~ 0.0688 -> a3

0.068 is between 0.0672 ~ 0.0688 -> a4

Therefore Message: $a_1 a_2 a_3 a_3 a_4$

Adventages and Disadventages of Arithmetic coding

Arithmetic coding does not require the probability distribution of the symbols for coding.

A one-to-one correspondence between source symbols and code words does not exist.

It can assume any arbitrary distribution and updates it as the symbols are transmitted/received in the encoder/ decoder. (It’s adaptive!)

It is one of the two lossless compression techniques in JPEG. (The other is Huffman coding).

However:

As the message becomes longer, the interval needed to represent it becomes smaller, and the more number of bits needed to specify that interval.

Shift code

Also known as $S_n$ code

Codewords are built by conjoining binary segments of fixed length.
The length of each segment is determined by value $n$ .
There are $2^n$ distinct $n$ -bit codewords.
$2^n - 1$ $2^{n} - 1$ codewords will be assigned as first input values
- the last codeword will be use to signify that the input is outside this range.

Example: Implementation of $S_2$ code

There are 4 distinct 2-bit code words.

The 4 distinct codeword : 00, 01, 10, 11
00, 01 and 10 are assigned to the first input values
- $W_1 = 00$ , $W_2 = 01$ , $W_3 = 10$
11 is used to signify that the input is outside this range (Shift code)
- the shift code is conjointed to 00, 01 and 10 respecively
- $W_4 = 1100$ , $W_5 = 1101$ , $W_6 = 1110$ , $W_7 = 111100$ …
This process is repeated until all inputs are assigned a codeword.

Example: Implementation of $S_3$ code

There are 8 distinct 3-bit code words.

The 8 distinct codeword : 000, 001, 010, 011, 100, 101, 110, 111
The first 7 codewords are assigned to the first input values
111 is used to signify that the input is outside this range (Shift code)
- the shift code is conjointed to the 7 codewords respectively
This process is repeated until all inputs are assigned a codeword.

Adventages of Shift code

easy to implement
reasonably efficient for inputs with monotonically decreasing probabilities

Comparison among different codes

JPEG standard

Joint Photographic Experts Group

Intraframe coding method for still images

Flexible image dimensions

Each component coded separately; any colorspace possible, but best compression for YUV

Flexible compression ratio

Compression 24:1 without loss of subjective quality is claimed.

JPEG is:

popular format for Web graphics
supports 24 bits of color information (True Color)
Most commonly used for photographs and similar continuous-tone bitmap images.
stores all of the color information in an RGB image, then reduces the file size by compressing it.
- lossy compression (caused by Quantization)
.jpg or .jpeg file extension

In the JPEG Compression Standard, Involved processes include :

Block Subdivision
(Discrete Cosine Transform) DCT-based Image Coding System
Quantization
Entropy Coding

Block Subdivision

ln transform image coding, an image is divided into a number of 2D blocks.

Each block is then coded separately.
Size of block depends on applications: typical block size: 8x8

Do Block Size matters?

We use block size to try to reduce the spatial redundancy.
If you use a Larger block (like in the Red rectangle), you can take care a lot of pixels who are redundant.
If you use small block, it cost less computational effort.

Why typical block size is 8x8 that small?

Researcher found out it is the best.

Advantages of small blocks size

Costs less computational effort.
Transform coder is more adaptive to local image property.
Does not impose significant memory requirements

Disadvantages of small blocks size

Correlation among neighboring blocks increases while the block size is decreasing
- This will diminish the performance of transform image coding. (Use the strong correlation between intensities of pixels that spatially closed together within a frame to compressing the block.)

2-D Discrete Cosine Transform (DCT)

Transform coding (DCT):

Data in spatial domain is converted to a frequency domain in which they can be more efficiently encoded.

The general formula of Forward DCT:

From Spatial Domain (x,y) to Transform Domain (i,j)

$\mathrm{DCT}(\mathrm{i}, \mathrm{j})=\frac{1}{\sqrt{2 \mathrm{~N}}} \mathrm{C}(\mathrm{i}) \mathrm{C}(\mathrm{j}) \sum_{\mathrm{x}=0}^{\mathrm{N}-1} \sum_{\mathrm{y}=0}^{\mathrm{N}-1} \text { pixel }(\mathrm{x}, \mathrm{y}) \mathrm{COS}\left[\frac{(2 \mathrm{x}+1) \mathrm{i} \pi}{2 \mathrm{~N}}\right] \operatorname{COS}\left[\frac{(2 \mathrm{y}+1) \mathrm{j} \pi}{2 \mathrm{~N}}\right]$

where $\text{pixel}(x,y)$ is the pixel intensity in the $N \times N$ block at location $(x, y)$

and

$C(i) \text { and } C(j)=\left\{\begin{array}{cc} 1 / \sqrt{2} & \text { for } i, j=0 \\ 1 & \text { else } \end{array}\right.$

8x8 block Monochrome Example

Transform each 8x8 block to a frequency domain in which they can be more efficiently encoded.

Forward DCT (FDCT):

Encoding

$F(u, v)=\frac{1}{4} C(u) C(v)\left[\sum_{i=0}^{7} \sum_{j=0}^{7} f(i, j) \cos \frac{(2 i+1) u \pi}{16} \cos \frac{(2 j+1) v \pi}{16}\right]$

where $f(i,j)$ is the pixel intensity in the 8x8 block at location (i, j)

and

$C(u) \text { and } C(v)=\left\{\begin{array}{cc}1 / \sqrt{2} & \text { for } u, v=0 \\1 & \text { else }\end{array}\right.$

F(0, 0) is known as DC. (The averages of the intensity)

$F(0,0)=\frac{1}{4} C(0) C(0)\left[\sum_{i=0}^{7} \sum_{j=0}^{7} f(i, j)\right.$

Inverse DCT (IDCT):

Decoding

$f(i, j)=\frac{1}{4} \sum_{u=0}^{7} \sum_{v=0}^{7} C(u) C(v) F(u, v) \cos \frac{(2 i+1) u \pi}{16} \cos \frac{(2 j+1) v \pi}{16}$

where $f(i,j)$ is the pixel intensity in the 8x8 block at location (i, j)

and

$C(u) \text { and } C(v)=\left\{\begin{array}{cc}1 / \sqrt{2} & \text { for } u, v=0 \\1 & \text { else }\end{array}\right.$

Visualization - 8x8 block Monochrome Example

F(0,0) is known as DC coefficient ( = average intensity of the block).
Others are known as AC coefficients.
- Most AC coefficients have zero or near-zero values.
- zero or near-zero values in AC coefficients means there are a lot of redundancy
- Higher frequency coefficients tend to be closer to zero

We will discard the Dark parts (low or zero values) and only keep the Bright parts.

When we reconstruct the DCT image, we can approximately get back the original spatial domain image.

If It is a Pure White image in that 8x8 block, the DCT result will be only the DC coefficient is white, all the AC coefficients are black. Because There are too much redundancy.

Major functions of transformation

To make as many transform coefficients as possible small enough so that they are insignificant and need not be coded for transmission.
To minimize statistical dependence, i.e., correlation between the transform coefficients.
The relatively good capability for bit-rate reduction mainly comes from two mechanisms:
- Not all transform coefficients need to be transmitted.
- Coefficients need not be coded with full accuracy.

Quantization

We need to do Quantization before Encoding.

The Purpose of Quantization is to Discarding information which is not visually significant

All $N \times N$ DCT coefficients are quantized using a $N^2$ -element quantization table

$F_{q}(u, v)=\operatorname{Round}\left(\frac{F(u, v)}{Q(u, v)}\right)$

where $Q(u,v)$ are quantization coefficients specified by a quantization table

8x8 block Monochrome Example

You can see the values is gradually increase across the frequency.
High frequency components are less visually significant
Coarser quantization step for higher frequency coefficients
- high freq coefficients are more likely to be zero after quantization

For De-quantization

Note we need to do De-quantization in the decoder in order to reconstruct the image.

$\hat{F}(u, v)=F_{q}(u, v) \times Q(u, v)$

Entropy encoding

DC and AC coefficients are encoded differently.

Entropy encoding comprise of 4 Techniques:

Vectoring (zig-zag scanning)
Differential coding
Run-length coding
Huffman coding

Vectoring - Zig-Zag Scanning

Place the lower frequency coefficients before higher one
- to generate more consecutive zeros
- permit the efficient use of the run-length coding

Coding DC coefficients - Differential Predictive Coding

Differential coding: used when the amplitude of the values that make up the source information cover a large range but the difference between successive values is relatively small.

There is usually a strong correlation between dc coefficients of adjacent NxN blocks

The DC coefficients are encoded using the predictive coding techniques.

Compute DC coefficient difference between 2 consecutive blocks.
If it’s the 1st block, assume there is a virtual preceeding block having DC coefficient 0.
- Just “Tank” the whole DC coefficient for 1st block
Convert the difference value into an intermediate symbol.
Encode the intermediate symbol using Huffman (or arithmetic) coding.

Intermediate symbol conversion of DC Coefficients:

Each dc coefficient difference value is represented by a pair of symbol:

symbol-1 (SIZE) i.e. the number of bits needed to encode AMPLITUDE
symbol-2 (AMPLITUDE) i.e. the amplitude of the difference
The range of DC coefficient difference $\in$ [-2048, 2047]

Symbol-1 is encoded with the following Huffman codeword table.

$\operatorname{SIZE}=\lceil\log _{2}(|x|+1)\rceil$

Where $x$ is the DC Difference value.

Symbol-2 is encoded in a SIZE-bit plain binary code format (1’s complement if –ve).

That is the binary code of Difference Value.

Example:

Difference value = 14 -> 1110
Difference value = -14 -> 1’s complement of 1110 -> 0001
Difference value = 0 -> Encoded Symbol 2 is “-”

Examples of Huffman coding for DC coefficients

Coding AC coefficient

For AC coefficients:

Convert zig-zag scanned ac coefficients into an intermediate symbol sequence.
Encode the intermediate symbol sequence using Huffman (or arithmetic) coding.

Intermediate symbol conversion of AC Coefficients:

Each non-zero AC coefficient is represented by a pair of symbols:

symbol-1 (RUNLENGTH, SIZE) i.e. assign a Codeword
- RUNLENGTH = the number of consecutive zero-valued ac coefficients preceding the non-zero ac coefficient $\in$ [0,15]
- SIZE = the minimum number of bits used to encode AMPLITUDE $\in$ $\in$ [0 to 10 bits]
  - the minimum number of bits required to represent the absolute value of the AC coefficient in binary format.
symbol-2 (AMPLITUDE)
- AMPLITUDE = in the range of [-1023, 1024]

Example: if the zig-zag scanned sequence of ac coefficients is {0, 0, 0, 0, 0, 0, 476}

=> (6, 9)(476)

Where 6 is the consecutive count, and $log_2(476 + 1) = 9$ (roundup).

To determine the SIZE value for an AC:

$\operatorname{SIZE}=\lceil\log _{2}(|x|+1)\rceil$

where $x$ is the non-0 AC coefficient

Because 0 is for DC.

Basically they share the same table

If RUNLENGTH is > 15:

use symbol-1 (15,0) to represent RUNLENGTH = 16

Example: if the sequence of ac coefficients is {0,0…0,12} where there are 39 0s,

=> (15,0)(15,0)(7,4)(12)

After coding the last non-0 ac coefficient in a block, use symbol-1 (0,0) => End Of Block (EOB)
- EOB usually occurs well before the mid-point of the array!

There are Huffman Coding for AC coefficient Table:

Full Example - Coding DC and AC

We actually compressed alot here

DC term is encoded first.

Here we assume the quantized DC term of the previous block is 76.

So DC DIFF = 79 - 76 = 3, Therefore AMPLITUDE = 3

For Size, we can found:

$\operatorname{SIZE}=\lceil\log _{2}(|x|+1)\rceil$

Where $x$ is the DC Difference value.

Therefore $log_2(3+1) = 2$ and SIZE = 2.

Therefore the DC term 79 is represented as (2)(3).

Then we encode the AC terms.

Therefore the final symbols are: (2)(3), (1,2)(-2), (0,1)(-1), (0,1)(-1), (0,1)(-1), (2,1)(-1), (0,0)

Finally, we use the Symbols to convert to Codewords:

Therefore the O/P bit stream: (011)(11), (11011)(01), (00)(0), (00)(0), (00)(0), (11100)(0), (1010)
Total of 31 bits are required to represent 64 pixels
Output bit rate = $\frac{31}{64}$ = 0.484 bits/pixel
Compression ratio = $\frac{8 \text{bits/pixel} \times 64 \text{pixels}}{31 \text{bits}} = 16.516$ $\frac{8 bits/pixel \times 64 pixels}{31 bits} = 16.516$
- We stated 8 bits per pixel here because the image is monochrome (grayscale)

JPEG Encoder / Decoder Schematic

JPEG Encoder

JPEG Decoder

Handling Color Image

In Previous Example, We see how Monchrome images are handled.

How about Color images?

For RGB, each channel is done separately. (i.e. There will by 3 layers of Matrix divided into 8x8 blocks)
For YCbCr, each channel is also done separately. Note Cb and Cr are smaller than Y because we already compressed them once.

To Handle Color Images, there are 2 Approaches.

Approach 1: Handle R, G and B plane separately

Approach 2: Transform (RGB) to another representation

JPEG File Format

For decoding, it is necessary to delimit each field and set of table values in a defined way

Each JPEG come with a Frame Header. And Each JPEG have Single Scan or Multiple Scan.

The Frame Header contains:

The overall width and height of the image in pixels
The number and type of components that are used to represent the image (RGB or YUV)
The digitization formation (4:2:2, 4:2:0, etc)

The Scan Header contains:

The identity of the components (R/G/B, etc)
The number of bits used to digitize each component
The quantization table of values that have been used to encode each component
Each scan comprises one or more segments:
- Each can contain a group of blocks preceded by a header.
- Each segment can be decoded independently of the others which overcomes
  - The possibility of bit errors propagating and affect other segments

Various modes of JPEG

The JPEG standard supports 4 modes

Sequential
- Default mode (Single Scan)
Progressive mode (Multiple Scan)
- Provides a low-quality version quickly (to reduce the delay!)
  - and then extra information to improve the quality.
Hierarchical mode (Multiple Scan)
- Provides a low-resolution version quickly (to reduce the delay!)
  - and then extra information to improve the resolution.
Lossless (JPEG-LS)
- No Transform coding
- Use differential coding technique

Progressive JPEG Compression

Produce quickly a rough image, and then improve its quality using multiple scans.

The progressive JPEG mode of operation produces a sequence of scans, each scan coding a subset of DCT coefficients.

Example 1: DC & low frequency AC coefficients in the 1st scan and then high frequency AC coefficients in the 2nd scan.
Encode the 1st few MSBs (e.g. bits 7-5) of the quantized DCT coefficients and then the LSBs.