Digital Video and Cameras Technologies

Digital Multimedia Information: all types of multimedia information are stored and processed within a computer in a digital form.
Digital videos are represented as sequences of digital images, while analogue videos are represented as a sequence of continuous time varying signals.
Images are taken from a camera or video camera in analogue form and subsequently digitized by certain electronic circuits into digital form

Basics of Analog and Digital Videos

Analog signal : amplitude varies continuously with time.

In video and audio system (audio is an integrated part of a video system), it is necessary first to convert any time-varying analog signals into a digital form.

Techniques involved in analog-to-digital conversion:

Sampling
Quantization

Some Basics - Frequency in signal processing

We need to use compression to reduce the signal bandwidth in order to fit in channel bandwidth.

To Convert Analog signal to Digital signal

Detail:

Quantization is the process that confines the amplitude of a signal into a finite number of values.
- Quantization error: the difference between the actual signal amplitude and the corresponding nominal amplitude.
  - $\text{Quantization error} = \frac{\text{Quantization Step}}{2}$

Analog/Digital Video

Videos (moving pictures) : a sequence of digitized pictures.

Frame rates

full-motion video: 24-30 frames/s or even 60 frames/s for HDTV
animation: 15-19 frames/s
video telephony: 5-10 frames/s
video conferencing & interactive media applications: 15-30 frames/s

Resolutions

Resolution in an analog or digital world is similar, but there are some important differences in how it is defined.

In analog video, an image consists of lines, or TV- lines, because analog video technology is derived from the television industry.
In a digital system, an image consists of square pixels (short for picture elements).

Aspect Ratios

4:3
16:9

Progressive and Interlace Scanning

If we apply the voltage along the vertical deflection coil, it create a magnetic field that move the electron beans up and down.
If we apply the voltage along the horizontal deflection coil, it create a magnetic field that move the electron beans left and right.

We can position the bean at any point on the screen we want. Using this we can create a image.

Progressive Scanning (Non-Interlaced Scanning)

a high frame rate should be used so that eye cannot perceive the individual pictures

making up continuous motion

Disadvantages of Progressive Scanning

Flicker effect - The rate of 30 frames per second is not rapid enough to allow the brightness of one picture blend smoothly into the next during the time when the screen is black between frames
- The result is a definite flicker of light as the screen is made alternately bright and dark
- This flicker effect is worse at higher illumination levels

Phosphor persistence: the time a phorphor dot remains illuminated after being energized.

Longer persistence phosphor reduce flicker

To Solve Flickering Problem:

High frame rate should be used so that eye cannot perceive the flicker effect
- wider bandwidth required (increase frame rate = data rate increased)

Another Way To Solve Flickering Problem is Interlace Scanning.

Interlace Scanning

We divide the frame into 2 fields and scan it.

to utilize a reasonable transmission bandwidth

to achieve the effect of continuous motion (less flicker)

Field rate is different from Frame rate.

Field rate = $2 \times$ Frame rate

By interlacing the horizontal scanning lines in two groups (fields)
- odd-numbered lines (odd-field)
- even-numbered lines (even-field)
The repetition rate of the fields is 60 per second
- as two fields are scanned during one frame period of 1/30s.
- In this way, 60 views of the picture are shown during 1s.
- This repetition rate is fast enough to eliminate flicker

Disadvantages of Interlaced Scanning

Motion problem: greater visibility of motion effects because the number of scanning lines is halved for moving objects.

To solve the motion problem: Use Progressive Scanning

The penalty : twice the bandwidth is ordinarily required.
However, the digital compression used in digital video will allow sequential scanning to be available as an option.

Another solution:

Deinterlacing Technique: Removing every other field (odd or even) and doubling the lines of each remaining field (con- sisting of only even or odd lines) by simple line doubling or, even better, using interpolation.

Composite Video Signals

The means for the units of most TV systems communicate. (Input signal of TV)

For Single cable, a composite video signal carry all the information including:

brightness
color
synchronization

The amplitude of video signal is divided into two parts

The upper 70% is used for camera signal
The lower 30% is used for synchronizing pulse

Horizontal and Vertical Scanning Frequencies

Vertical Scanning Frequency
- the rate at which the electron beam completes its cycles of vertical motion, from top to bottom and back to top again, ready to start the next vertical scan. i.e. field rate. (PAL: 50Hz, NTSC: 60 HZ)
Horizontal Scanning Frequency
- the rate at which the electron beam completes its cycles of horizontal motion, from left to right and back to left again, ready to start the next horizontal scan.
- This is also the number of lines scanned per second.

NTSC TV System (North America and Japan)

Line per frame = 525 lines
Field rate = 60 Hz
The time of each vertical scanning cycle for one field = $1/60$ s
Number of lines per field = $525/2 = 262.5$ $525/2 = 262.5$ lines
- 0.5 line exists.
The horizontal frequency = $262.5 \times 60 = 15750$ Hz
The time for each horizontal scanning line = $1/15750$ $1/15750$ s
- In term of microseconds, $H \text { time }=\frac{1,000,000}{15,750} \mu s=63.5 \mu s$

the H time (Single line time) is $63.5 \mu s$ .

PAL TV System (China, Europe, …)

Line per frame = 625 lines
Field rate = 50 Hz
The time of each vertical scanning cycle for one field = $1/50$ s
Number of lines per field = $625/2 = 312.5$ $625/2 = 312.5$ lines
- 0.5 line exists.
The horizontal frequency = $312.5 \times 50 = 15625$ Hz
The time for each horizontal scanning line = $1/15625$ $1/15625$ s
- In term of microseconds, $H \text { time }=\frac{1,000,000}{15,625} \mu s=64 \mu s$

the H time (Single line time) is $64 \mu s$ .

Horizontal sync pulse, retrace and blanking

Front Porch
- ensures that the end of each line is blanked before the retrace (flyback) begins
Back Porch
- to accommodate color burst => synchronize the color portion of a color TV

Horizontal flyback from the composite video signal

Goal :

Use integration to get the triangular waveform from he sync pulse. (through RC circuit).
- Since we can use differentiation to get the horizontal sync pulse trace from the triangular (sawtooth) waveform.

Vertical sync and vertical blanking

Time of line/field must be precise

Some examples:

Digital Video format

Resolution in Analog Video (480i60 vs 576i50)

Name convention to define the number of lines, type of scan, and refresh rate

“i” stands for interlaced scanning.

“p” stands for progressive scanning.

NTSC (480i60) : 480 lines, refresh rate of 60 interlaced fields per second (or 30 full frames per second).
PAL (576i50) : 576 lines, refresh rate of 50 interlaced fields per second (or 25 full frames per second).

Digital Video Basics - Frame

All images are displayed in the form of a two dimensional matrix of individual picture elements called pixels.

Resolution = Horizontal resolution × Vertical resolution

8 bits per pixel = 256 different colors (pixel value 0 to 255)

When analog video is digitized, the maximum number of pixels that can be created is based on the number of TV lines available for digitization.
NTSC: the maximum size of the digitized image is 720×480 pixels (NTSC D1).
PAL: the maximum size is 720×576 pixels (PAL D1).

Besides, there are also other formats.

4CommonIntermediateFormat (4CIF): 704 x 480 (NTSC) or 704 x 576 (PAL)
2CommonIntermediateFormat (2CIF): 704 x 240 (NTSC) or 704 x 280 (PAL)
QuarterCommonIntermediateFormat (QCIF): 176 x 120 (NTSC) or 176 x 144 (PAL)

RGB format

To represent color image, 3 channel of 8 bit RGB color is formed.

8+8+8 = 24 bits

Exercise:

Derive the size of the image with RGB digitization format. Assume that each has the resolution of 1920×1080 and the pixel of each component is represented by 8 bits

No. of pixels per component = 1920 x 1080 = 2073600

Bits of Red channel image: 1920 x 1080 x 8 bits = 16588800 bits

Bits of Green channel image: 1920 x 1080 x 8 bits = 16588800 bits

Bits of Blue channel image: 1920 x 1080 x 8 bits = 16588800 bits

Size of image: 1920 x 1080 x 8 x 3 bits = 49766400 bits

Assume of the fps of this HDVideo is 60fps.

Data rate of video = 49766400 x 60 = 2985984000 bits/second

It is big! so we need to find a way to compress the image.

therefore YUV is introduced.

YUV format

One luminance and Two chrominance components are used to describe the color of each pixel.
Y - one luminance component
- Luminance components describe the brightness of pixels.
UV or CbCR - two chrominance components
- Chronminance components carry the color information of pixels.

The major advantage is to compress color.

YUV 4:2:2 , YUV 4:2:0

The resolution of the eyes is more sensitive for luminance or than it is for color.

Using the above image as example again

Bits of Red channel image: 1920 x 1080 x 8 bits = 16588800 bits

Bits of Green channel image: 1920 x 1080 x 8 bits = 16588800 bits

Bits of Blue channel image: 1920 x 1080 x 8 bits = 16588800 bits

Size of image in RGB format: 1920 x 1080 x 8 x 3 bits = 49766400 bits

What if it is in YUV 4:2:2 format?

4:2:2 - horizonal resolution reduced by half (4 -> 2)

Bits of Y channel image: 1920 x 1080 x 8 bits = 16588800 bits

Bits of U channel image: 1920/2 x 1080 x 8 bits = 8294400 bits

Bits of V channel image: 1920/2 x 1080 x 8 bits = 8294400 bits

Size of image in YUV 4:2:0 format: 33177600 bits

What if it is in YUV 4:2:0 format?

4:2:0 - Both Vertical and horizonal resolution reduced by half (4 -> 2)

Bits of Y channel image: 1920 x 1080 x 8 bits = 16588800 bits

Bits of U channel image: 1920/2 x 1080/2 x 8 bits = 4147200 bits

Bits of V channel image: 1920/2 x 1080/2 x 8 bits = 4147200 bits

Size of image in YUV 4:2:0 format: 24883200 bits

Size of YUV 4:2:0 < Size of YUV 4:2:2 < Size of RGB

We mainly use YUV 4:2:0 for high definition video.

We only use YUV 4:2:2 for professional production (e.g. movies).

Some Useful formula

$\text{The size of a frame in RGB} = H \times W \times 3\text{ bytes}$

$\text{The size of a frame in YUV 4:2:2} =H \times W \times 2 \text{ bytes}$

$\text{The size of a frame in YUV 4:2:0} = H \times W \times 1.5 \text{ bytes}$

$\text{data rate} = \text{size of a frame} \times \text{number of frames}$

$\text{Storage required} = \text{data rate} \times \text{time in seconds}$

RGB to YUV (CCIR 601 standard) transformation

$\begin{aligned} Y &=0.299 R+0.587 G+0.114 B \\ U &=0.564(B-Y) \\ V &=0.713(R-Y) \end{aligned}$

When R=G=B (Grayscale),

Y = R = G = B

U = V = 0 (No color information)

VGA and XGA Format

VGA (Video Graphics Array) and XGA (Extended Graphics Array) resolutions

With the introduction of network cameras, 100 percent digital systems can be designed.
This avoids the limitations of NTSC and PAL irrelevant.
Derived from the computer industry.
They are worldwide standards and provide better flexibility.
VGA resolution is normally better suited for network cameras because the video will be shown, in most cases, on computer screens with resolutions in VGA or multiples of VGA.

High-Definition Television (HDTV) Resolutions

Dervice from the TV industry
Video in HDTV resolution contains much more detail because the number of pixels is larger.
Compatibility with computer displays

Digital Camera Basics

Imager – a device that converts the optical image of a scene into an electrical signal.

Camera categories

Studio, professional and Consumer.
Analog and Digital.

Camcorder – portable camera is combined with a video recorder

Studio cameras

design to provide optimum performance in a controlled environment, usually at the expense of size, weight and portability.
A larger camera can support larger lens:
- faster optical speed and greater zoom ranges. (sporting event pickup).
When a studio audience is present, larger cameras look more professional.
Remote control operator
- In a studio setting, it is desirable for the camera operator to devote all attention to framing and focusing the picture and moving the camera. All technical matters should be handled elsewhere

Professional Camcorders

News gathering: has own internal recorder and completely free of any cable connection when necessary.
Larger than the minimum that might otherwise be possible:
- A fast lens with a wide zoom is desired.
- Professionals want interchangeable lenses, so they can choose a different lens for different operating conditions.
- Shoulder mounting of the camera is desirable because this gives the operator more control over the camera movement and gives a higher shooting angle.
- Other facilities, such as lights, microphones, radio intercom, should be mountable on the camera.

Consumer Camcorders

The overriding objective of a consumer camcorder is the price.
Hand-held camcorder: weighting less than 1 kg (including batteries).

CCD (Charge Coupled Device) Camera

A digital video can also be obtained from digital camera using CCD as the sensor.
In this case, the whole image is directly shined on the cells of the CCD sensor.
The intensity values of the cells are shifted out and quantized using the built in A/D circuit to form pixels of the frame

Active Picture and Blanking Periods

Video is a sequence of frames.

Active Picture Period - the time where the picture is shown on the display.
Blanking Period - the preparation time for capturing next frame after previous frame

The camcorder’s Image Sensor

The lens in a camcorder also serves to focus light, but instead of focusing it onto film, it shines the light onto a small semiconductor image sensor.

An image sensor is a pixel-sized solid-state photosensitive element that generates and stores an electric charge in a “well” at each photosite when it is illuminated.

Each photosite measures the amount of light (photons) that hits a particular point, and translates this information into electrons (electrical charges):
- A brighter image is represented by a higher electrical charge, and a darker image is represented by a lower electrical charge.
- (The number of electrons collected is proportional to the number of photons hitting that particular photosite)

Imager is a rectangular array of sensors upon which an image of the scene is focused.

In most configuration, the sensor includes the circuitry that stores and transfers its charge to a shift register, which converts the spatial array of charges in the imager into a time-varying video output current.

Camera Sensors

The 2 Types of sensor technology

Charge Coupled Device (CCD) vs Complementarty Metal Oxide Semiconductor (CMOS)

	CCD	CMOS
Quality	high-quality, low-noise images.	more susceptible to noise.
Light sensitivity	Higher light sensitivity	lower light sensitivity
Power consumption	consumes lots of power	consume little power
Cost	expensive	inexpensive
Mass production	more mature	less mature

Quality

CCD: create high-quality, low-noise images.

CMOS: are more susceptible to noise.

Light sensitivity

Since each pixel on a CMOS sensor has several transistors located next to it, the light sensitivity of a CMOS chip is lower. Many of the photons hitting the transistors instead of the photodiode.

Power consumption

CMOS: consume little power

CCD: consumes lots of power. (~100 times more power)

Cost

CMOS chips can be fabricated on standard silicon production line, so they are inexpensive compared to CCDs.

Mass production

CCD sensors have been mass produced for a longer period of time, so they are more mature. They tend to have higher quality pixels, and more of them.

CCD Operation

CCD on-chip operations

Charge Integration (During Active Picture Period)

Readout (During Vertical Blanking Period)

Highlight Overload (During Active Charge Integration)

Color Separation Methods

Method 1: Beam Splitter

the high-end solution: 3 chips camcorder

separates a signal into three different versions of the same image

the level of red light, green light and blue light.

Each of these images is captured by its own chip, but each measures the intensity of only one color of light.
The camera then overlays these three images and the intensities of the different primary colors blend to produce a full-color image.

Disadvantages:

bulky and expensive
difficulties in maintaining image registration

Method 2: CFA Filter (Bayer Filter)

solution with lower cost: 1 chip camcorder

A more practical way to record the 3 primary colors from a single image is to place a permanent filter over each individual photosite:

CFA contains an array of photosites, each of which is sensitive to only 1 color spectral band

Advantages:

all color information is recorded at the same time
the camera can be smaller and cheaper

Sensor Architecture of CFA filter

Microlens (on-chip lens)
- since the sensitive area of a CCD pixel is only a fraction of its total area, the sensitivity of CCDs can be increased by mounting a layer of tiny lenses in front of its sensor.
- focus and concentrate light onto the photodiode surface instead of allowing it to fall on non-photosensitive areas of the device, where it is lost from the imaging information collected by the CCD
- enhance light gathering ability

Mosaicing problem in CFA filter

The raw output from a sensor with a CFA filter is a mosaic of red, green and blue pixels of different intensity.

Miscolor problem might happen:

Demosaicing Algorithms is used to tickle this mosaicing problem.

Demosaicing Algorithm

The idea of the algorithm is to:

reconstruct a full color image from the spatially sub-sampled color channels output from the CFA

Key idea: each colored pixel can be used more than once. The true color of a single pixel can be determined by averaging the values from the closest surrounding pixels

First, do the channel separation
Then estimate missing values by averaging the values from the closest surrounding pixels
- Linear interpolation (starting from the top left)

CCD Transfer and Readout Architectures

The 3 different architectures of CCD

Full-frame CCD Architecture

Shift vertically in the parallel CCD shift register, then shift horizonally in the serial CCD shift register.

Advantage of Full-frame CCD Architecture

Good sensitivity and allow high resolution: entire pixel array is used to detect incoming photons during exposure to the object being imaged.

Disadvantage of Full-frame CCD Architecture

low readout rates: not suitable for high frame-rate video capture (digital videos)
Transfer smear: vertical streaks above and below bright spots in the image

Transfer smear:

When the charges in the image array move across the optical image during the transfer of charges

The sensor array element picks up small spurious charges during the transfer that cause streaks in the picture that are most visible on picture highlights.

It can be prevented by a rotating mechanical shutter that blocks the light from the image during the transfer period, but mechanical component can be easily damaged in an all-electronic system.

Frame-transfer CCD Architecture

The CCD structure is divided into three sections

an imaging area
a field storage array (as buffer)
an output register
Shift vertically in the parallel CCD shift register, then shift horizonally in the serial CCD shift register.
- During the vertical banking period, a command from a clock causes the charges in each column of pixels in the sensing area to be shifted to a corresponding column in the storage area. This free the sensors to accumulate charge from the illumination of the next frame.
Image first shift to storage area, then shift to output.

Advantage of Frame-transfer CCD

Higher readout rates (faster frame rate): During the period in which the parallel storage array is being read, the image array is busy integrating charge for the next image frame.

Disadvantage of Frame-transfer CCD

Transfer smear: still exist, but reduce significantly.
Higher cost: twice the silicon area is required to implement  Frame-transfer devices are more costly to produce.

Interline-transfer CCD Architecture

commonly used in consumer cameras

The sensors in vertical columns are connected to storage elements in alternate columns.

Image first FASTLY shift to storage area, then shift to output.

Advantage of Interline-transfer CCD

Higher readout rates (faster frame rate)
Reduced smear, no shutter is required: a common problem with frame-transfer CCDs, is also reduced with interline CCD architecture because of the rapid speed (only one or a few microseconds) in which image transfer occurs.

Disadvantage of Interline-transfer CCD

Lower image resolution and higher cost: higher unit cost to produce chips with the more complex architecture
Lower sensitivity: a decrease in photosensitive area present at each pixel site.
- (This shortcoming can be partially overcome by incorporation of microlenses on the photodiode array complex to increase the amount of light entering each element.)