Image Segmentation

The objective is to subdivide an image into separate regions that are homogeneous with respect to a chosen property such as color, brightness, texture, etc.

Segmentation algorithms generally are based on 2 basic properties of gray level values:

Discontinuity - isolated points, lines and edges of image.

Similarity - thresholding, region growing, region splitting and merging.

Segmentation Methods:

Thresholding
Edge-based segmentation
Region-based segmentation

Thresholding

one of the most important approaches to image segmentation.

If background and object pixels have gray levels grouped into 2 dominant modes, they can be separated with a threshold easily.

Thresholding - Algorithm Explained

1.Select an initial estimate for T. (e.g. average of the image)
2.Segment the image using T into 2 groups:
- G1 consisting of pixels with gray levels > T
- G2 consisting of pixels with gray levels <= T
3.Compute the average gray level values $\mu_1$ and $\mu_2$ for the pixels in G1 and G2.
4.Compute a new threshold T =( $\mu_1+\mu_2$ )/2.
5.Repeat 2 to 4 until the difference in T in successive iteration is smaller than a certain value.

Nonadaptive thresholding (Global thresholding)

Multilevel thresholding is in general less reliable as it is difficult to establish effective thresholds to isolate the regions of interest.

Adaptive Thresholding

Also known as Local thresholding.

Image is divided into subimages
A local threshold is determined independently in each subimage
Each subimage is then processed with respect to its local threshold.

If a threshold can’t be determined in a subimage, it can be interpolated with thresholds obtained in neighboring subimages.

Brighter image , use higher Threshold
Darker image, use lower Threshold

Detection of Discontinuities

This is a global approach.

There are 3 basic types of discontinuities:

Points (also called Island)
Lines
Edges (Sharp changes of intensity)

The detection is based on convoluting the image with a spatial mask. (A general 3x3 mask)

Point Detection

A point has been detected at the location p(i,j) on which the mask is centered if |R| > T, where T is a nonnegative threshold, and R is obtained with the following mask.

The idea is by using the difference of the gray level of an isolated point and its neightbors

$\left[\begin{array}{rrr} -1 & -1 & -1 \\ -1 & 8 & -1 \\ -1 & -1 & -1 \end{array}\right]$

When threshold is applied, the island will appear.

This technique can also be used as Noise removing. You apply the mask, use a threshold to find out the noise. Then you can remove it and get back the original image.

Example:

Line Detection

Line Masks:

How about other angles?

Horizontal line + 45° line = 22.5°
Horizontal line + -45° line = -22.5°
Vertical line + 45° line = 67.5°
Vertical line + -45° line = -67.5°
etc…

Note

Line mask are symmetric. Correlation = Convolution.

If you have to estimate what kind of line, just apply all 4 different masks (Horizontal, Vertical, 45, -45) and then see the result and compare.

If, at a certain point in the image, $|R_a|>|R_b|$ for all a≠b, that point is said to be more likely associated with a line in the direction of mask a.

Edge Detection

Most frequently used

An edge is a property attached to an individual pixel and is calculated from the image function behavior in a neighborhood of the pixel.

Edges are pixels where brightness changes abruptly.

A change of the image function can be described by a gradient that points in the direction of the largest growth of the image function.

Locates sharp changes in the intensity function
Magnitude of the first derivative detects the presence of the edge.
- 1st derivative see if there is edges or not (sharp shape)
- Most pixel on the edge has the larger magnitude
Sign of the second derivative determines whether the edge pixel lies on the dark side or light side.
- 2nd derivative check how the edges are (from dark to bright side or from bright to dark side)

Look at gray level of image

Take first derivative

Take second derivative

In practice the case is something like this:

Gradient operator

For a function f(x,y), the gradient of f at coordinates (x’,y’) is defined as the vector

$\nabla f\left(x^{\prime}, y^{\prime}\right)=\left[\begin{array}{l} \frac{\partial f}{\partial x} \\ \frac{\partial f}{\partial y} \end{array}\right]_{\left(x^{\prime}, y^{\prime}\right)}$

Magnitude of gradient vector $\nabla f\left(x^{\prime}, y^{\prime}\right)$ :

$|\nabla f\left(x^{\prime}, y^{\prime}\right)|=\left.\left(\left(\frac{\partial f}{\partial x}\right)^{2}+\left(\frac{\partial f}{\partial y}\right)^{2}\right)^{1 / 2}\right|_{\left(x^{\prime}, y^{\prime}\right)}$

Direction of gradient vector $\nabla f\left(x^{\prime}, y^{\prime}\right)$ :

$\alpha\left(x^{\prime}, y^{\prime}\right)=\left.\tan ^{-1}\left(\frac{\partial f}{\partial y} / \frac{\partial f}{\partial x}\right)\right|_{\left(x^{\prime}, y^{\prime}\right)}$

Its magnitude can be approximated in the digital domain in a number of ways, which result in a number of operators such as Roberts, Prewitt and Sobel operators for computing its value.

Sobel operators (3x3) - correlation

It provides both a differentiating and a smoothing effect, which is particularly attractive as derivatives typically enhance noise.

Laplacian Operator

The Laplacian of a 2D function f(x,y) is a 2nd-order derivative defined as

$\nabla^{2} f\left(x^{\prime}, y^{\prime}\right)=\frac{\partial^{2} f}{\partial x^{2}}+\frac{\partial^{2} f}{\partial y^{2}} \huge|_{\left(x^{\prime}, y^{\prime}\right)}$

The Laplacian operator has the same properties in all directions and is therefore invariant to rotation in the image.
It can also be implemented in digital form in various ways.

For a 3x3 region, the mask is given as

$\left[\begin{array}{ccc} 0 & -1 & 0 \\ -1 & 4 & -1 \\ 0 & -1 & 0 \end{array}\right]$

Laplacian operator is not as good as the Sobel operator. Some how it still give the edges.

Edge linking and boundary detection

Edges might broken. So we need to link them to the neighbourhood

The techniques of detecting intensity discontinuities yield pixels lying only on the boundary between regions.

In practice, this set of pixels seldom characterizes a boundary completely because of :

noise
breaks in boundary from nonuniform illumination
or other effects that introduce spurious intensity discontinuities.

A Too large threshold will result in broken edge segments
A Too small threshold will result in thick edge segments

Edge detection algorithms are typically followed by linking and other boundary detection procedures designed to assemble edge pixels into meaningful boundaries.

Two principal properties used for establishing similarity of edge pixels in this kind of analysis are:

The strength of the response of the gradient operator used to produce the edge pixel.
The direction of the gradient.

A point (x’,y’) in the neighborhood of (x,y) is linked to the pixel at (x,y) if both the following magnitude and direction criteria are satisfied.

$\begin{array}{l} |\nabla f\left(x^{\prime}, y^{\prime}\right)-\nabla f(x, y) \mid \leq \text {Threshold } T_{m} \\ |\alpha\left(x^{\prime}, y^{\prime}\right)-\alpha(x, y) \mid \leq \text {Threshold } T_{d} \end{array}$

We can consider only those pixels that lie on or near the boundary between objects and the background such that the associated histogram is well-shaped to provide a good chance for us to select a good threshold.

The gradient can indicate if a pixel is on an edge or not.

A preliminary edge map can be defined based on the gradient map.

Small local regions covering the detected edge pixels are studied.

For each local region, a threshold value is defined based on its histogram and thresholding is applied to segment the region.

Region-oriented segmentation

In previous methods, we partition an image into regions by finding boundaries between regions based on intensity discontinuities.

Here, segmentation is accomplished via thresholds based on the distribution of pixel properties, such as intensity or color.

The segmentation must be complete, i.e. every point must be in a region.

Points in a region must be connected.

The regions must be disjoint.

It deals with the properties that must be satisfied by the pixels in a segmented region - for example $P(R_i)=$ true if all pixels in $R_i$ have the same intensity.

Regions $R_i$ and $R_j$ are different in the sense of predicate $P$ .

Region growing by pixel aggregation

a procedure that groups pixels or subregions into larger regions.

Pixel aggregation starts with a set of “seed” points from those grows by appending to each seed point those neighboring pixels that have similar properties such as gray level, texture and color.

Seed point (Starting point) is selected by the user (usually highest or smallest intensity)
Select a threshold
Then start growing (follow checking order)

Examples:

Example:

Region splitting and merging

To subdivide an image initially into a set of arbitrary, disjointed regions and then merge and/or split the regions in an attempt to satisfy some predefined conditions.

The split and merge algorithm is done by:

Split into 4 disjointed quadrants
Merge any adjacent regions
Stop when no further merging or spliting is possible

Example (Step from left to right)

Quadtree

We can use a quadtree to show how the image is partitioned.

Image Representation

The objective is to represent and describe the resulting aggregate of segmented pixels in a form suitable for further computer processing after segmenting an image into regions.

There are 2 choices for representing a region:

External characteristics: Its boundary
- external representation is chosen when the primary focus is on shape characteristics
Internal characteristics: the pixels comprising the region
- internal representation is selected when the when the primary focus is on reflectivity properties, such as color and texture

In either case, the features selected as descriptors should be as insensitive as possible to variations such as change in size, translation and rotation.

Representation schemes

The segmentation techniques yield raw data in the form of pixels along a boundary or pixels contained in a region.

Although these data are sometimes used directly to obtain descriptors (as in determining the texture of a region), standard practice is to use schemes that compact the data into representations that are considerably more useful in the computation of descriptors.

Chain Codes

Used to represent a boundary by a connected sequence of straight line segments of specified length and direction
The direction of each segment is coded by using a numbering scheme

The boundary of an object is sampled based on a sampling grid and then approximated with a sequence of segments connecting the grid nodes close to the boundary.

Example:

The accuracy of the resulting code representation depends on the spacing of the sampling grid.

How to determine the sampling grid size?

The larger the grid size:
- the shorter the chain code
- the less sensitive to noise
- lower resolution
- more coarse the approximation
The smaller the grid size:
- the longer the chain code
- the more sensitive to noise
- Higher resolution
- less more coarse the approximation

Normalization for size

Alter the size of the resampling grid

Normalization for starting point

Only known as Normalized with respect to starting point

Treat the code as a circular sequence and redefine the starting points that the resulting sequence of numbers forms an integer of minimum magnitude.

Example 1: {766542422200} -> {007665424222}

Example 2: {422200766542} -> {007665424222}

Normalization for rotation

Only known as Normalized with respect to origentation

Use the first difference of the chain code instead of the code itself. The difference is simply by counting (counter-clockwise) the number of directions that separate two adjacent elements of the code.

Example:

The first difference of the 4-direction chain code 10103322

(2(last) -> 1(first) -> 0 -> 1 -> 0 -> 3 -> 3 -> 2 -> 2) therefore the difference is 33133030.

why 2-> 1 the difference is 3?

The difference is counted by anti-clockwise. Therefore the first difference is 3.

therefore the total difference is 33133030.

The skeletion of a region

The structural shape of a plane region can be reduced to a graph.
This reduction can be accomplished by obtaining the skeleton of the region via a thinning algorithm.

Medial axis transformation

The skeleton of a region may be defined via the medial axis transformation (MAT) proposed by Blum in 1967.

How to find the medial axis?

Raster scan all pixels in the object
For each one of them:
- if there are more than 1 shortest paths (shortest distance) from the pixel centre to the boundary
  - Mark it as medial axis (skeleton of the region)

Example:

Red are medial axis,

Blue / Green are not. (Blue was used to show the idea of shortest path to the boundary)

So we will have something like these.

Although MAT produce pretty nice skeleton, the direct implementation is very computationally time consuming. Therefore it is infeasible for large region/object.
Implementation potentially involves calculating the distance from every interior point to every point on the boundary of a region.

Example:

Correct Answer:

You may ask, why is it not a striaght line instead?

Because only if there are more than 1 shortest paths (shortest distance) from the pixel centre to the boundary, we mark it as medial axis (skeleton of the region).

So this one (the horizonal red line) is a wrong MAT.

Wrong Answer (and why it is wrong):

Thinning algorithm for binary regions

It produce something similar to MAT, but not exactly

Assume region points have value 1 and background points 0
A contour point is any pixel with value 1 and having at least one 8-neighbor valued 0.

The 8-neighbor means the outer pixel of $P_1$ .

So $P_2 - P_9$ (count clockwise) is the neighborhood of contour point pixel $P_1$ .

The thinning method consists of successive passes of two steps applied to the contour points.

Step 1:

Flag a Contour point $P_1$ $P_{1}$ for deletion if all the following conditions are satisfied:
- (a) $2 \leq N(p_1) \leq 6$ , where $N(p_1) = \sum^9_{i=2} p_i$
- (b) $S(p_1) = 1$ , where $S(p_1)$ is the “01” pattern occurance in the ordered sequence of $p_2, p_3 \cdots p_9$
- © $p_2 \times p_4 \times p_6$ = 0
- (d) $p_4 \times p_6 \times p_8$ = 0

Flag the Contour point for deletion means we just mark it first, then we delete them later.

Points are not deleted until all border points have been processed

After step 1, delete the flagged points

Step 2:

Flag the remaining Contour point $P_1$ $P_{1}$ for deletion if all the following conditions are satisfied:
- (a) $2 \leq N(p_1) \leq 6$ , where $N(p_1) = \sum^9_{i=2} p_i$
- (b) $S(p_1) = 1$ , where $S(p_1)$ is the “01” transitions in the ordered sequence of $p_2, p_3 \cdots p_9$
- (c’) $p_2 \times p_4 \times p_8$ = 0
- (d’) $p_2 \times p_6 \times p_8$ = 0

After step 2, delete the flagged points

Example of the first 2 rule in Step1 and Step 2:

P2+P3+P4+P5+P6+P7+P8+P9 = 1 + 1 + 1 + 1 = $N(p_1) = 4$

“01” pattern occurance = 01010110 = $S(p_1) = 3$

The basic procedure is applied iteratively until no further points are deleted, at which time the algorithm terminates; yielding the skeleton of the region.

Explaination: Physical meanings of the conditions

(a)

$2 \leq N(p_1) \leq 6$ , where $N(p_1) = \sum^9_{i=2} p_i$

This conidtion is violated when contour point is

the end point of a skeleton strobe ( $N(p_1) = 1$ )

an inner pixel ( $N(p_1) > 6$ )

(b)

$S(p_1) = 1$ , where $S(p_1)$ is the “01” transitions in the ordered sequence of $p_2, p_3 \cdots p_9$

This conidtion is violated when it is applied to points on a stroke 1 pixel thick. Hence this condition prevents disconnection of segments of a skeleton during the thinning operation.

e.g.
$\begin{array}{lll} 0 & 0 & 1 \\ 1 & p_1 & 0 \\ 0 & 0 & 1 \end{array}$

Should not be deleted ( $S(p_1) > 1$ ).

© and (d)
$p_2 \times p_4 \times p_6$ = 0

$p_4 \times p_6 \times p_8$ = 0

A point that satisfied conditions (a)-(d) is an east or south boundary point or a northwest corner point in the boundary. Similarly, a point that satisfied conditions (a), (b), (c’) and (d’) is a north or west boundary points, or a southeast corner point in the boundary. In either case, p1 is not part of the skeleton and should be removed.

Boundary descriptors

Polygonal approximations

The objective is to capture the essence of the boundary shape with the fewest possible polygonal segments.

Splitting Technique

To subdivide a segment successively into two parts until a given criterion is satisfied.

Step1: Mark the 2 boundary pixels which are farthest away as control points.
Step2: Connect 2 neighboring control points to form a line segment.
Step3: If its distance from the farthest boundary pixel > threshold, mark the pixel as a new control point
Step4: Goto step 2 until no more new control point is found.
Step5: Link all adjacent control points to form the contour of the object.

Animation Example:

O means the control points.

Example:

Splited once

Calculate r cos45 = $r/\sqrt2$

distance from the farthest boundary pixel = $1 - r/\sqrt2 = r(1 - \frac{1}{\sqrt2})$

threshold = $r(1-1/\sqrt2)$ (given in question)

Since distance from the farthest boundary pixel NOT > threshold, No further spliting

Fourier descriptors

Coordinate pairs of points encountered in traversing an N-point boundary in the xy plane are recorded as a sequence of complex numbers.

${(x,y)} \rightarrow {x + yj}$

An N-point DFT is performed to the sequence and the complex coefficients obtained are called the Fourier descriptors of the boundary.

DFT Formula:

$\mathbf{x}[\mathbf{n}] \longrightarrow \mathbf{X}[\mathbf{k}]=\sum_{\mathbf{n}=0}^{\mathrm{N}-1} \mathbf{x}[\mathbf{n}] \mathbf{e}^{-\mathrm{j} 2 \pi \frac{\mathrm{k}}{\mathrm{N}} \mathbf{n}}$

or in another notation:

$\begin{aligned} X_{k} &=\sum_{n=0}^{N-1} x_{n} \cdot e^{-\frac{j 2 \pi}{N} k n} \\ &=\sum_{n=0}^{N-1} x_{n} \cdot\left[\cos \left(\frac{2 \pi}{N} k n\right)-j \cdot \sin \left(\frac{2 \pi}{N} k n\right)\right] \end{aligned}$

Inverse DFT (IDFT):

$\mathrm{X}[\mathrm{k}] \longrightarrow \mathrm{x}[\mathrm{n}]=\frac{1}{\mathrm{N}} \sum_{\mathrm{k}=0}^{\mathrm{N}-1} \mathrm{X}[\mathrm{k}] \mathrm{e}^{\mathrm{j} 2 \pi \frac{\mathrm{k}}{\mathrm{N}} \mathrm{n}}$

or in another notation:

$\begin{aligned} x_{n}&=\frac{1}{N} \sum_{k=0}^{N-1} X_{k} \cdot e^{j 2 \pi k n / N} \\ &=\frac{1}{N}\sum_{n=0}^{N-1} X_{k} \cdot\left[\cos \left(\frac{2 \pi}{N} k n\right)+j \cdot \sin \left(\frac{2 \pi}{N} k n\right)\right] \end{aligned}$

Mark the boundary coordinate starting from East(x axis) clockwise
Turn coordinates into polar coordinates ${(x,y)} \rightarrow {x + yj}$
Use the sequence to do a DFT to get Magnitude and Phase
- Magnitude is more useful
- Rotation of image will not change the Magnitude but only phase
- Translated image = original image sequence + transformation : will not change the Phase but only magnitude slightly changes (DC is appeared)
- Scaled image = original image sequence x scale: change the magnitude (x Scale) but not phase
- Using different starting point will not affect the magnitude

In general, only the first few coefficients are of significant magnitude and are pretty enough to describe the general shape of the boundary.

Fourier descriptors are not directly insensitive to geometrical changes such as translation, rotation and scale changes, but the changes can be related to simple transformations on the descriptors.

Some basic properties of Fourier Descriptors:

Region descriptors

The Descriptors providing measures of properties such as smoothness, coarseness and regularity are used to quantify the texture content of an object.

Texture descriptors

There are 3 principal approaches used in image processing to describe the texture of a region:

Statistical Approaches
Structural Approaches
Spectral Approaches

Statistical Approaches

One of the simplest approaches for describing texture is to use moments (e.g. mean and variance) of the gray-level histogram of an image or region.

Statistical approaches yield characterizations of textures as smooth, coarse, grainy, and so on.
Firstly find the object center $\frac{\sum(c'(i))}{N}$ , where
- c’(i) are the coordinates
- N is the number of coordinates
Then we assign center as (0,0), all the coordinates are normalized as c(i).
Next, $\mu_n = \frac{\sum(c(i)g(i)^n)}{N}$ , where
- c(i) are the normalized c’(i) (coordinates)
- g(i) are the intensity value of corrsponding coordinates
Finally, the Descriptor = $(\mu_1, \mu_2,\mu_3 \cdots \mu_n)$

Structural Approaches

Structural techniques deal with the arrangement of image primitives.

Use a set of predefined texture primitives and a set of construction rules to define how a texture region is constructed with the primitives and the rules.

Spectral Approaches

Spectral techniques are based on properties of the Fourier spectrum and are used primarily to detect global periodicity in an image by identifying high-energy, narrow peaks in the spectrum.

The Fourier spectrum is ideally suited for describing the directionality of periodic or almost periodic 2-D patterns in an image.

3 Features of the spectrum are useful for texture description:

prominent peaks give the principal direction of the patterns
the location of the peaks gives the fundamental spatial period of the patterns
eliminating any periodic components via filtering leaves nonperiodic image elements, which can be described by statistical techniques

The spectrum can be expressed in polar coordinates to yield a function.

The Two functions can then be used to describe the texture accordingly:

$\begin{aligned} S(\theta) &=\sum_{r=0}^{R} S(r, \theta) \\ S(r) &=\sum_{\theta=0}^{\pi} S(r, \theta) \end{aligned}$