Digital Image Processing - Image Compression
Digital Image Compression and Coding Standards
The major objective of image coding is to represent an image with as few bits as possible while preserving the level of quality and intelligibility required for the given application.
- Image coding exploits the spatial redundancy in the image.
Applications:
- Reduction of channel bandwidth required for image transmission.
- e.g. Video phone, Fax machine, Broadcast TV, HDTV, remote sensing via satellite, videoconferencing, etc.
- Reduction of storage requirement required for image storage.
- e.g. Storage of image data, graphic database, weather map, medical images, etc.
Criteria of quality (To measure to quality of the image):
- Objective criteria - MSE(Mean Square Error) etc
- Subjective criteria - Sharpness etc.
Quality of image is given with repect to the human observation. Therefore it is a more subjective stuff.
Image compression system model
Source coding
- Compress data.
- Remove redundancy from data prior to transmission.
- e.g. Transform coding etc.
Channel coding
Telecommunication people worry about this. we won’t go into this.
- Allow detection or correction of transmission error.
- Add redundancy to data.
- e.g. HDLC etc.
Classification of image coding
Error-free compression:
Will not loss any information, we just reduce the redundancy.
- Bit-plane coding
- JBIG coding (a from of arithmetic coding developed by IBM)
- Run-length coding
Lossy compression:
Some less important information will be lost.
- Predictive coding
- Transform coding (JPEG)
- Subband coding
- Pyramid coding
- Vector quantization coding
- Block truncation coding
- Fractal coding
- Wavelet coding (JPEG2000)
Now we delve into the Source Coding.
Image source coding model
In the Source Coding Stage (We don’t look the Channel coding stage):
The encoder compress the image, then the decoder reconstruct the image.
- Compression state: Mapper -> Quantizer -> Symbol Encoder
- Decompression state: Symbol Decoder -> Inverse Mapper
Mapper
Maps the input set of data (pixel intensity) into another set of data which is easier to process.
- To save space
Examples :
We scan line by line, count the numbers of White pixels and Black pixels.
- {number of white pixels, number of black pixels} : Always starting with the right pixel.
- E.g. {3, 4}, {4,1} means 3 white pixel, 4 black pixel. Then 4 white pixels, 1 black pixel. (look at the first line.)
- EOL means End of Line. If the remaining are all WHITE pixels, we cant just use EOL for notation.
- In the last line, there are 3 black pixels at first so it is {0,3}. 0 white 3 black.
DCT is like DFT but we only keep the real part and dump all the imaginary part.
- In the DCT domain, we keep the low frequency coefficent
- normally high frequency coefficient can be ignored
Quantizer
Quantizes the mapper output to a limited number of possible values.
- E.g. Convert Float values back to Integer After Transform coding
Examples :
- In the previous example, We used a Transform coding to MAP the integer values 0 - 255 become Floating point values.
- Since floating point values are not friendly, we quantize the coefficients
- i.e. limit the number of possible level of the output back to 0 - 255 so 8 bit is enough to store each pixel.
- some information is lost
- Originally, each pixel will occupy 24 bits because RGB (full color).
- We reduce the total colors, by selecting only 8 bits to store some of the colors (based on MSE criterion).
- Colors are selected through Clustering (K-Nearest Neighbor) and pick the mean
- The Selected colors form a palette (can be less than 8 bit)
- The palette will act as look up table
- Some information is lost
Coder
Performs codeword assignment to the quantizer output.
- If the combination occur more often, a shorter codeword is assigned
- If the combination occur less often, a longer codeword is assigned
- Codeword assign is based on probability
- The Codewords are always unique
- EOL can also be assigned with a Codeword if it is frequently used
This kind of Probability based coding is actually called Entropy Coding.
Entropy Coding
Based on Probability.
-
Uniform-length codeword assignment
- Each pixel use the same number of bits to represent
- e.g. 8 bit 1 index
- The average bit rate is the same as the bit rate
- Each pixel use the same number of bits to represent
-
simple and easy-implemented
- Not very efficient
-
Variable-length codeword assignment
- need to know the Entropy concept
- Coding Method includes:
- Huffman coding
- Arithmetic coding
- Shift code
Entropy
Entropy is a measure of the amount of information associated with the set of coder input values and gives a lower bound on the average number of bits required to code these inputs.
Entropy, , of random variables , where , with the corresponding probabilities ’s is given by:
- is the probability of
- can be something like {2,0} , {3,0} depends on what kind of symbol you want to transmit.
- E.g. Transmit alphabet letters(i.e. abcdaae…) , and each alphabet variable is a .
- Note since is always , is negative. To make the value back to positive, a “-” is added.
Example:
Suppose probabilities of are respectively.
Then the entropy of these 6 inputs:
bits
That means on averages, we need to use a 2.14 bit to send each symbol.
It is just a theoretical result.
In entropy coding, shorter codewords are assigned to more probable messages while longer codewords are assigned to less probable messages.
- If the probability of that message is high, a shorter codeword is assigned
- If the probability of that message is low, a longer codeword is assigned
Huffman coding
- Using Huffman coding we can approximate how many bits for each variable.
Huffman encoding
Suppose we have a1 to a6 , we first sort them by Probability (From highest to lowest).
In each step, we combine the lowest probabilty with second lowest probability. Then sort it again
And then we start to assign codewords from shortest to the longest.
The assign procedure is like this:
- First we have the general
- Break it into 2, Assign 1 to high probability and Assign 0 to low probability
- If the 2 segments have the same probability (e.g. and has 0.1, assign the more elements one with 1 (i.e. is 1 , is 0))
- Break it into 2, Assign 1 to high probability and Assign 0 to low probability
- If the element is not alone, further break it until it is alone.
- The process end when all elements has a codeword assignment.
- Stage 4 -> 3 -> 2 -> 1
Here the photo represented ABCDEF as a123456.
Note it is close to the entropy. Actually Huffman provide a more accurate result.
To decode Huffman code:
- Use a State machine
- Move 1 step forward when receiving a 0/1.
- Output symbol X and reset the machine when state X is reached.
E.g. 101101010
Firstly move to first number
1 -> a2, then reset state
0 -> 01 -> 011 -> a1, then reset state
0 -> 01 -> 010 -> 0101 -> 01010 -> a3, then reset state.
Adventages and Disadventages of Huffman Coding
Huffman code is one of the most efficient codeword assignment.
Huffman encoding is also used in JPEG compression.
However:
- It is difficult to construct the codebook when the number of inputs are large
- Need to know the probability distribution of the input symbols ahead
- Cannot adjust the codewords to handle non-stationary inputs
Arithmetic coding
- In arithmetic coding, a message is represented by an interval of real numbers between 0 and 1.
- It is special because we don’t assign codewords to it.
Arithmetic coding
Message:
- Each coming symbol narrows the range by which the message on hand is represented.
- The final message symbol narrows the range to [0.06752, 0.0688)
- any number within this subinterval (e.g. 0.068) can be used to represent the message.
- For higher efficiency, one should use the value having the shortest binary form in the range [0.06752, 0.0688).
- How to Convert a decimal fraction to binary
- 0.06752 = 0.000100010XXXX…
- 0.0688 = 0.000100011XXXX…
Therefore the shortest binary form is 0.000100011 -> 9 bits to encode the message.
To decode the Arithmetic coding:
E.g. Decode 0.000100011 (that is 0.068)
0.068 is between 0 ~ 0.2 -> a1
0.068 is between 0.04 ~ 0.08 -> a2
0.068 is between 0.056 ~ 0.072 -> a3
0.068 is between 0.0624 ~ 0.0688 -> a3
0.068 is between 0.0672 ~ 0.0688 -> a4
Therefore Message:
Adventages and Disadventages of Arithmetic coding
Arithmetic coding does not require the probability distribution of the symbols for coding.
A one-to-one correspondence between source symbols and code words does not exist.
It can assume any arbitrary distribution and updates it as the symbols are transmitted/received in the encoder/ decoder. (It’s adaptive!)
It is one of the two lossless compression techniques in JPEG. (The other is Huffman coding).
However:
- As the message becomes longer, the interval needed to represent it becomes smaller, and the more number of bits needed to specify that interval.
Shift code
Also known as code
- Codewords are built by conjoining binary segments of fixed length.
- The length of each segment is determined by value .
- There are distinct -bit codewords.
- codewords will be assigned as first input values
- the last codeword will be use to signify that the input is outside this range.
Example: Implementation of code
There are 4 distinct 2-bit code words.
- The 4 distinct codeword : 00, 01, 10, 11
- 00, 01 and 10 are assigned to the first input values
- , ,
- 11 is used to signify that the input is outside this range (Shift code)
- the shift code is conjointed to 00, 01 and 10 respecively
- , , , …
- This process is repeated until all inputs are assigned a codeword.
Example: Implementation of code
There are 8 distinct 3-bit code words.
- The 8 distinct codeword : 000, 001, 010, 011, 100, 101, 110, 111
- The first 7 codewords are assigned to the first input values
- 111 is used to signify that the input is outside this range (Shift code)
- the shift code is conjointed to the 7 codewords respectively
- This process is repeated until all inputs are assigned a codeword.
Adventages of Shift code
- easy to implement
- reasonably efficient for inputs with monotonically decreasing probabilities
Comparison among different codes
JPEG standard
Joint Photographic Experts Group
- Intraframe coding method for still images
- Flexible image dimensions
- Each component coded separately; any colorspace possible, but best compression for YUV
- Flexible compression ratio
- Compression 24:1 without loss of subjective quality is claimed.
JPEG is:
- popular format for Web graphics
- supports 24 bits of color information (True Color)
- Most commonly used for photographs and similar continuous-tone bitmap images.
- stores all of the color information in an RGB image, then reduces the file size by compressing it.
- lossy compression (caused by Quantization)
.jpg
or.jpeg
file extension
In the JPEG Compression Standard, Involved processes include :
- Block Subdivision
- (Discrete Cosine Transform) DCT-based Image Coding System
- Quantization
- Entropy Coding
Block Subdivision
ln transform image coding, an image is divided into a number of 2D blocks.
- Each block is then coded separately.
- Size of block depends on applications: typical block size: 8x8
Do Block Size matters?
- We use block size to try to reduce the spatial redundancy.
- If you use a Larger block (like in the Red rectangle), you can take care a lot of pixels who are redundant.
- If you use small block, it cost less computational effort.
Why typical block size is 8x8 that small?
- Researcher found out it is the best.
Advantages of small blocks size
- Costs less computational effort.
- Transform coder is more adaptive to local image property.
- Does not impose significant memory requirements
Disadvantages of small blocks size
- Correlation among neighboring blocks increases while the block size is decreasing
- This will diminish the performance of transform image coding. (Use the strong correlation between intensities of pixels that spatially closed together within a frame to compressing the block.)
2-D Discrete Cosine Transform (DCT)
Transform coding (DCT):
- Data in spatial domain is converted to a frequency domain in which they can be more efficiently encoded.
The general formula of Forward DCT:
From Spatial Domain (x,y) to Transform Domain (i,j)
where is the pixel intensity in the block at location
and
8x8 block Monochrome Example
Transform each 8x8 block to a frequency domain in which they can be more efficiently encoded.
Forward DCT (FDCT):
Encoding
where is the pixel intensity in the 8x8 block at location (i, j)
and
F(0, 0) is known as DC. (The averages of the intensity)
Inverse DCT (IDCT):
Decoding
where is the pixel intensity in the 8x8 block at location (i, j)
and
Visualization - 8x8 block Monochrome Example
- F(0,0) is known as DC coefficient ( = average intensity of the block).
- Others are known as AC coefficients.
- Most AC coefficients have zero or near-zero values.
- zero or near-zero values in AC coefficients means there are a lot of redundancy
- Higher frequency coefficients tend to be closer to zero
We will discard the Dark parts (low or zero values) and only keep the Bright parts.
When we reconstruct the DCT image, we can approximately get back the original spatial domain image.
If It is a Pure White image in that 8x8 block, the DCT result will be only the DC coefficient is white, all the AC coefficients are black. Because There are too much redundancy.
Major functions of transformation
- To make as many transform coefficients as possible small enough so that they are insignificant and need not be coded for transmission.
- To minimize statistical dependence, i.e., correlation between the transform coefficients.
- The relatively good capability for bit-rate reduction mainly comes from two mechanisms:
- Not all transform coefficients need to be transmitted.
- Coefficients need not be coded with full accuracy.
Quantization
We need to do Quantization before Encoding.
- The Purpose of Quantization is to Discarding information which is not visually significant
All DCT coefficients are quantized using a -element quantization table
where are quantization coefficients specified by a quantization table
8x8 block Monochrome Example
- You can see the values is gradually increase across the frequency.
- High frequency components are less visually significant
- Coarser quantization step for higher frequency coefficients
- high freq coefficients are more likely to be zero after quantization
For De-quantization
Note we need to do De-quantization in the decoder in order to reconstruct the image.
Entropy encoding
DC and AC coefficients are encoded differently.
Entropy encoding comprise of 4 Techniques:
- Vectoring (zig-zag scanning)
- Differential coding
- Run-length coding
- Huffman coding
Vectoring - Zig-Zag Scanning
- Place the lower frequency coefficients before higher one
- to generate more consecutive zeros
- permit the efficient use of the run-length coding
Coding DC coefficients - Differential Predictive Coding
Differential coding: used when the amplitude of the values that make up the source information cover a large range but the difference between successive values is relatively small.
There is usually a strong correlation between dc coefficients of adjacent NxN blocks
- The DC coefficients are encoded using the predictive coding techniques.
- Compute DC coefficient difference between 2 consecutive blocks.
- If it’s the 1st block, assume there is a virtual preceeding block having DC coefficient 0.
- Just “Tank” the whole DC coefficient for 1st block
- Convert the difference value into an intermediate symbol.
- Encode the intermediate symbol using Huffman (or arithmetic) coding.
Intermediate symbol conversion of DC Coefficients:
Each dc coefficient difference value is represented by a pair of symbol:
- symbol-1 (SIZE) i.e. the number of bits needed to encode AMPLITUDE
- symbol-2 (AMPLITUDE) i.e. the amplitude of the difference
- The range of DC coefficient difference [-2048, 2047]
Symbol-1 is encoded with the following Huffman codeword table.
Where is the DC Difference value.
Symbol-2 is encoded in a SIZE-bit plain binary code format (1’s complement if –ve).
That is the binary code of Difference Value.
Example:
- Difference value = 14 -> 1110
- Difference value = -14 -> 1’s complement of 1110 -> 0001
- Difference value = 0 -> Encoded Symbol 2 is “-”
Examples of Huffman coding for DC coefficients
Coding AC coefficient
For AC coefficients:
- Convert zig-zag scanned ac coefficients into an intermediate symbol sequence.
- Encode the intermediate symbol sequence using Huffman (or arithmetic) coding.
Intermediate symbol conversion of AC Coefficients:
Each non-zero AC coefficient is represented by a pair of symbols:
- symbol-1 (RUNLENGTH, SIZE) i.e. assign a Codeword
- RUNLENGTH = the number of consecutive zero-valued ac coefficients preceding the non-zero ac coefficient [0,15]
- SIZE = the minimum number of bits used to encode AMPLITUDE [0 to 10 bits]
- the minimum number of bits required to represent the absolute value of the AC coefficient in binary format.
- symbol-2 (AMPLITUDE)
- AMPLITUDE = in the range of [-1023, 1024]
Example: if the zig-zag scanned sequence of ac coefficients is {0, 0, 0, 0, 0, 0, 476}
=> (6, 9)(476)
Where 6 is the consecutive count, and (roundup).
To determine the SIZE value for an AC:
where is the non-0 AC coefficient
Because 0 is for DC.
Basically they share the same table
If RUNLENGTH is > 15:
- use symbol-1 (15,0) to represent RUNLENGTH = 16
Example: if the sequence of ac coefficients is {0,0…0,12} where there are 39 0s,
=> (15,0)(15,0)(7,4)(12)
- After coding the last non-0 ac coefficient in a block, use symbol-1 (0,0) => End Of Block (EOB)
- EOB usually occurs well before the mid-point of the array!
There are Huffman Coding for AC coefficient Table:
Full Example - Coding DC and AC
We actually compressed alot here
DC term is encoded first.
Here we assume the quantized DC term of the previous block is 76.
So DC DIFF = 79 - 76 = 3, Therefore AMPLITUDE = 3
For Size, we can found:
Where is the DC Difference value.
Therefore and SIZE = 2.
Therefore the DC term 79 is represented as (2)(3).
Then we encode the AC terms.
- Therefore the final symbols are: (2)(3), (1,2)(-2), (0,1)(-1), (0,1)(-1), (0,1)(-1), (2,1)(-1), (0,0)
Finally, we use the Symbols to convert to Codewords:
- Therefore the O/P bit stream: (011)(11), (11011)(01), (00)(0), (00)(0), (00)(0), (11100)(0), (1010)
- Total of 31 bits are required to represent 64 pixels
- Output bit rate = = 0.484 bits/pixel
- Compression ratio =
- We stated 8 bits per pixel here because the image is monochrome (grayscale)
JPEG Encoder / Decoder Schematic
JPEG Encoder
JPEG Decoder
Handling Color Image
In Previous Example, We see how Monchrome images are handled.
How about Color images?
- For RGB, each channel is done separately. (i.e. There will by 3 layers of Matrix divided into 8x8 blocks)
- For YCbCr, each channel is also done separately. Note Cb and Cr are smaller than Y because we already compressed them once.
To Handle Color Images, there are 2 Approaches.
- Approach 1: Handle R, G and B plane separately
- Approach 2: Transform (RGB) to another representation
JPEG File Format
For decoding, it is necessary to delimit each field and set of table values in a defined way
Each JPEG come with a Frame Header. And Each JPEG have Single Scan or Multiple Scan.
The Frame Header contains:
- The overall width and height of the image in pixels
- The number and type of components that are used to represent the image (RGB or YUV)
- The digitization formation (4:2:2, 4:2:0, etc)
The Scan Header contains:
- The identity of the components (R/G/B, etc)
- The number of bits used to digitize each component
- The quantization table of values that have been used to encode each component
- Each scan comprises one or more segments:
- Each can contain a group of blocks preceded by a header.
- Each segment can be decoded independently of the others which overcomes
- The possibility of bit errors propagating and affect other segments
Various modes of JPEG
The JPEG standard supports 4 modes
- Sequential
- Default mode (Single Scan)
- Progressive mode (Multiple Scan)
- Provides a low-quality version quickly (to reduce the delay!)
- and then extra information to improve the quality.
- Provides a low-quality version quickly (to reduce the delay!)
- Hierarchical mode (Multiple Scan)
- Provides a low-resolution version quickly (to reduce the delay!)
- and then extra information to improve the resolution.
- Provides a low-resolution version quickly (to reduce the delay!)
- Lossless (JPEG-LS)
- No Transform coding
- Use differential coding technique
Progressive JPEG Compression
- Produce quickly a rough image, and then improve its quality using multiple scans.
The progressive JPEG mode of operation produces a sequence of scans, each scan coding a subset of DCT coefficients.
- Example 1: DC & low frequency AC coefficients in the 1st scan and then high frequency AC coefficients in the 2nd scan.
- Encode the 1st few MSBs (e.g. bits 7-5) of the quantized DCT coefficients and then the LSBs.
Hierarchical JPEG Compression
- In this mode, the total image is first sent using a lower resolution then at a higher resolution.