Tuto 6

Q1: Given that the pitch of a middle C note played with a flute is 261 Hz. What are the pitches of the other C notes played with the same flute? What do you expect the pitches of the D notes played with the same flute?

The distance between any 2 musical half-steps is equal to the 12th root of 2 ( $\sqrt[12]{2}$ ≈1.0595 = r).

There are 12 musical half-steps in one musical octave.

Freq. of a note $x r$ = Freq. of the next half-step

Freq. of a note $x r^{12}$ = Freq. of a note which is 1 octave higher

To find the pitches on the same note, simply multiply/divide by 2.

The pitches the C notes played with the same flute are:

32.625, 62.25, 130.5, 261, 522, 1944, 2088, 4176 and 8352 Hz.

Note D is 2 half-steps above Note C.

pitch of D = pitch of $C \times r \times r = C \times \sqrt[12]{2} \times \sqrt[12]{2} = C \times 2^{\frac{1}{12}} \times 2^{\frac{1}{12}}$

Pitches of D notes played with the same flute are:

36.62, 73.24, 146.48, 292.96, 585.93, 1171.9, 2343.7, 4687.4, 9374.8 Hz.

Q2: If we add 2 identical sounds together, what will be the increase in DB?

It is a constructive addition, so the Amplitude ratio is 2 to 1.

$\text{dB difference} = 20\log_{10}2 = +6 dB$

Remarks:

$10\log_{10}(\frac{\text{Power}_2}{\text{Power}_1}) = 10\log_{10}(\frac{\text{Amplitude}_2}{\text{Amplitude}_1})^2 = 20\log_{10}(\frac{\text{Sound pressure level}_2}{\text{Sound pressure level}_1})$

Q3: What is the relation between loudness and sound pressure level in decibels? Is 80 dB twice as loud as 40 dB? How do you translate from decibels to loudness?

Sound level in dB is a physical quantity and may be measured objectively.
Loudness is a perceived quantity and one can only obtain measurements of it by asking people questions about loudness or relative loudness. (Different people have different answers)

Relating the two is called psychophysics. Psychophysics experiments show that subjects report a doubling of loudness for each increase in sound level of approximately 10dB.
(So roughly speaking, 50dB is twice as loud as 40dB, 60 dB is twice as loud as 50dB, etc.)

Since 80dB is 40dB more than 40dB, 80dB is roughly 2x2x2x2 = 16 times as loud as 40 dB.

$\text { The Minimum sampling frequency } \mathrm{f}_{\mathrm{s}}=2 \mathrm{f}_{\mathrm{max}}$

$sin(2\pi ft)$

Find the highest frequency

$f$ in $3sin(200\pi t)$ -> $f = 100$

$f$ in $6sin(400\pi t)$ -> $f = 200$

$f$ in $sin(500\pi t)$ -> $f = 250 = f_{max}$

$f_s = 2f_{max} = 500$ Hz

Frequency Spectrum is symmetrical.

$T_s = 1$ ms

$f_s = \frac{1}{T_s} = 1000$ Hz

next sampled at 1 kHz

Aliasing happens, result in folding

Therefore:

$f(t) = Asin(\omega t)$

$f_{r m s}=\sqrt{\frac{1}{T} \int_{0}^{T}\left(A \sin (w t)\right)^{2} d t}$

$=\sqrt{\frac{A^2}{T} \int_{0}^{T}\left(\sin (w t)\right)^{2} d t}$

Note: $sin^2(x) = \frac{1}{2} - \frac{1}{2} cos(2x)$

More about Trigonometric Identities

$=\sqrt{\frac{A^2}{T} \int_{0}^{T}(\frac{1}{2} - \frac{cos(2\omega t)}{2})d t}$

$=\sqrt{\frac{A^2}{T}\times\frac{1}{2} \int_{0}^{T}(1 - cos(2\omega t))d t}$

Actually we can directly get the solution here.

period （0至nT）嘅cos 同sin嘅integration 都係0

畫張圖諗下就知

Therefore $=\sqrt{\frac{A^2}{T} \frac{T}{2}} = \frac{\sqrt{2}}{2}A$

Note $\int cos(At)dt = \frac{sin(At)}{A}$

Therefore $\int cos(2\omega t)dt = \frac{sin(2\omega t)}{2\omega}$

$=\sqrt{\frac{A^2}{T}\times\frac{1}{2} (\int_{0}^{T}1dt - \int_{0}^{T}cos(2\omega t)d t)}$

$=\sqrt{\frac{A^2}{T}\times\frac{1}{2} ([t]^T_0 - [\frac{sin(2\omega t)}{2\omega}]^T_0)}$

$=\sqrt{\frac{A^2}{T}\times\frac{1}{2} ((T-0) - (\frac{sin(2\omega T)}{2\omega}-\frac{sin(2\omega 0)}{2\omega}))}$

$=\sqrt{\frac{A^2}{T}\times\frac{1}{2} (T-\frac{sin(2\omega T)}{2\omega})}$

$=\sqrt{\frac{A^2}{T}\times\frac{1}{2} (T-0)}$

$=\sqrt{\frac{A^2}{T} \frac{T}{2}} = \frac{\sqrt{2}}{2}A$

Tuto 7

To see whether they are identical, find the Tranfer functions of them.

Define special variable for easy caluclation.

For 1a:

$R = X + (-Yz^{-1}) = X - Yz^{-1}$

$S = R + Sz^{-1}$

$S(1 - z^{-1}) = R$

$S = \frac{R}{1-z^{-1}}$

$Y = S + N$

$Y = N + \frac{X- Yz^{-1}}{1 - z^{-1}}$

$(1-z^{-1})Y = (1-z^{-1})N + X - Yz^{-1}$

$(1-z^{-1})Y + Yz^{-1} = (1-z^{-1})N + X$

$Y = (1-z^{-1})N + X = X + N(H_1(z))$

For 1b:

$Y = A + N$

$A = X - (Y-A)z^{-1}$

$A = X - Yz^{-1} + Az^{-1}$

$(1 - z^{-1})A = X-Yz^{-1}$

$A = \frac{X - Yz^{-1}}{1-z^{-1}}$

$Y = N + \frac{X - Yz^{-1}}{1-z^{-1}}$

$(1-z^{-1})Y = (1-z^{-1})N + X - Yz^{-1}$

$(1-z^{-1})Y + Yz^{-1} = (1-z^{-1})N + X$

$Y = (1-z^{-1})N + X = X + N(H_1(z))$

Q2(i): Dervice the output sequence.

Keep write down the input, output and error signal until error signal has become 0.

Q2(ii): Is the output a periodic pattern sequence? What is the period of the sequence if it is?

It repeats pattern “01010”. The period is 5.

Q2(iii): What is the problem with a periodic output? Suggest a solution to solve this problem.

A constant level input results in a regular output pattern. If the period of the repetition of such patterns is long enough, they may be audible as a deterministic or oscillatory tone, rather than as noise.

Solution: Dithering the input fed into the quantizer as follows.

Q3(i): Give the transfer function of the filter in z domain.

$\text{Transfer function} = H(z) = \frac{Y(z)}{X(z)}$

Just do change to Z domain, then do a change of subject.

In this case:

$Y(z) = \frac{1}{3}(X(z) + X(z)z^{-1} + X(z)z^{-2})$

$Y(z) = X(z)\times \frac{1}{3}(1 + z^{-1} + z^{-2})$

$H(z) = \frac{Y(z)}{X(z)} = \frac{1}{3}(1 + z^{-1} + z^{-2}) = \frac{1}{3}(\frac{z^2}{z^2} + \frac{z^1}{z^2} + \frac{1}{z^2}) = \frac{z^2 + z^1 + 1}{3z^2}$

Q3(ii): Determine the zero(s) and pole(s) of the filter (if there is any).

Zeros = root of ( $Y(z) = 0$ )

Poles = root of ( $X(z) = 0$ )

in this case,

Zeros: $z_{1}=-\frac{1}{2} \dashv j \frac{\sqrt{3}}{2}, z_{2}=-\frac{1}{2}-j \frac{\sqrt{3}}{2}$

Poles: $z = 0$

Q3(iii): Is it a FIR?

If Pole = 0, it is a FIR. Otherwise it is IIR.

In this case, Pole = 0, therefore it is a FIR.

Q3(iv): Determine the frequency response of the filter.

Power frequency response:

$\mathrm{P}(\omega=\left|\mathrm{H}\left(\mathrm{e}^{\mathrm{j} \omega}\right)\right|^{2}=\left.\mathrm{H}(\mathrm{z}) \mathrm{H}\left(\frac{1}{\mathrm{z}}\right)\right|_{\mathrm{z}=\mathrm{e}^{j\omega} }$

Special thing to remember:

$z^n + z^{-n} = 2cos(n\omega)$

Therefore:

$\frac{z^2 + z^1 + 1}{3z^2} \times \frac{z^{-2} + z^{-1} + 1}{3z^{-2}}$

$= \frac{1+z^1+z^2+z^{-1}+1+z^1+z^{-2}+z^{-1}+1}{9}$

$= \frac{3+2z^1+2z^{-1}+z^2+z^{-2}}{9}$

$= \frac{3+4cos(\omega)+2cos(2\omega)}{9}$

$= \frac{1}{3} + \frac{4}{9}cos(\omega) + \frac{2}{9}cos(2\omega)$

Q3(v): Sketch the frequency response of the filter.

Just use the frequency response formula and put $\omega$ as 0, $\frac{\pi}{2}$ , $\frac{2\pi}{3}$ and $\pi$

$|P(0)| = \frac{3+4+2}{9} = 1$

$|P(\frac{\pi}{2})| = \frac{3+0-2}{9} = \frac{1}{9}$

$|P(\frac{2\pi}{3})| = \frac{3-2-1}{9} = 0$

$|P(\pi)| = \frac{3-4+2}{9} = \frac{1}{9}$

So you can plot it out. This is the Frequency response (Power).

Or taking square root:

$|H(0)| = 1$

$|H(\frac{\pi}{2})| = \frac{1}{3}$

$|H(\frac{2\pi}{3})| = 0$

$|H(\pi)| = \frac{1}{3}$

Then you can plot it out. This is the Frequency response (Amplitude).

Q3(vi): Based on (v), determine if the filter is an LPF, HPF or BPF.

LF > HF = Lowpass Filter

HF > LF = Highpass Filter

BF > HF and LF = Bandpass Filter

In this case it is LPF.

Q3(vii): Implement the filter.

Tuto 8

Q1: A perceptual audio codec is used to compress an audio signal. The codec groups every 4 barks into a subband and then allocates bits to different subbands according to the result of a spectrum analysis based on a psychoacoustic model. All samples in the same subband are quantized with the same quantizer, and the bit resolution of which is allocated by the codec. (The Bark scale is a psychoacoustical scale proposed by Eberhard Zwicker in 1961.)

Q1(i) Locate the potential maskers.

Potential maskers are all the audiable local maximum.

Blue ovals are also local maximum but they are not hearable, therefore they are not potential maskers.

Red ovals are the potential maskers.
Positions of 7 potential maskers: bark 7, 11, 14, 15, 18, 21 and 23.

Q1(ii) Based on the given psychoacoustic model, derive the masking threshold.

The psychoacoustic model define how will the maskers (the Arrow) mask. (Masking Curve Derivation)

For example (not match with the question)

Then we use blue color to incidate the non-audiable sounds.

In this question, we were given such psy. Model.

Fig. 1b is the psychoacoustic model. Apply the mask on each maskers.

Q1(iii) Determine the Signal-to-Mask levels of each subband.

SMR in each subband = Highest - Lowest Mask

Subband 1: 0 dB

Subband 2: 45 - 18 = 27 dB

Subband 3: 0 dB

Subband 4: 60 - 35 = 25 dB

Subband 5: 50 - 42 = 8 dB

Subband 6: 85 - 50 = 35 dB

Subband 7: 0 dB

Subband 8: 0 dB

Q1(iv) Suppose allocating one additional bit to a subband results in a 6dB drop of the noise floor in that subband. Allocate an appropriate number of bits to all subbands.

6dB = 1 bit

$\lceil\frac{SMR_{dB}}{dB}\rceil = bit$

Subband 1: 0 bit

Subband 2: 27 / 6 = 5 bit

Subband 3: 0 bit

Subband 4: 25 / 6 = 5 bit

Subband 5: 8 / 6 = 2 bit

Subband 6: 35 / 6 = 6 bit

Subband 7: 0 bit

Subband 8: 0 bit

Bonus

Decimal/Binary Converter

Q4(i)

$\text { The Minimum sampling frequency } \mathrm{f}_{\mathrm{s}}=2 \mathrm{f}_{\mathrm{max}}$

$2 \times 4$ k = 8kHz

Q4(ii)

Original:

Sampled at 8kHz:

Q4(iii)

The LPF covers until 8kHz.

Therefore our samples should be put in the range of LPF.

Therefore the sampling rate is 12kHz.

Q4(iv)

Now the Sampling rate is 6 kHz

With Anti-Aliasing filter means the Aliasing is removed by another LPF.

With Anti-Aliasing :

Without Anti-Aliasing :

Since there will be distortion Without Anti-Aliasing, We should use a Anti-Aliasing filter.

$\text{The number of Quantization levels} = 2^{\text{wordlength}}$

$\text{Quantization step} = \frac{range}{\text{levels}}$

$\text{Maximum error} = \pm\frac{\text{Quantization step}}{2}$

Therefore in this case:

levels = $2^3 = 8$

quantization step = $\frac{1-(-1)}{8} = \frac{1}{4}$

Maximum error = $\pm\frac{1}{8}$

Sampling Frequency $f_s = 1$ kHz

Sampling Period $T_s = \frac{1}{1000}$ s

$nT_s = \frac{n}{1000}$

Recall Sampling:

$x(t) => x(nT_s)$

Therefore the sampled Sine wave:

$sin(2\pi ft) => sin(2\pi f nT_s) = sin(2\pi \times 100 \times \frac{n}{1000})= sin(\frac{n\pi}{5})$

Samples = 10

Therefore sub n = 1,2,3,4,5,6,7,8,9

Lets assume the probability density function is:

$\text{lower range} < x(n) < \text{upper range}$

Let the word-length is $n$ bits.

Total levels = $2^n$

Range = Upper range - Lower range

$\text{Quantization step} = \frac{\text{range}}{\text{total levels}}$

Finding Mean.

Probability = $p(x)$

Since it is a Uniform probability density function, each probability has same chance (i.e. $p(x) = \frac{1}{\text{range}}$ )

$\text{Mean Value} = mean(x) = \int_{\text{lower}}^{\text{upper}} x p(x) d x = \frac{1}{\text{range}}\int_{\text{lower}}^{\text{upper}} x d x$

According to reverse power rule $\int x^{n} d x=\frac{x^{n+1}}{n+1}$ , further solving the mean value:

$\frac{1}{\text{range}}\int_{\text{lower}}^{\text{upper}} x d x = \frac{1}{\text{range}}[\frac{x^{1+1}}{1+1}]^\text{upper}_\text{lower} = \frac{1}{\text{range}}(\frac{upper^2}{2} - \frac{lower^2}{2})$

The variance is defined on the signal itself.

$\text{Variance} = x^2_{rms} = \int_{\text{lower}}^{\text{upper}} x^2 p(x) d x = \frac{1}{\text{range}}\int_{\text{lower}}^{\text{upper}} x^2 d x$

According to reverse power rule $\int x^{n} d x=\frac{x^{n+1}}{n+1}$ , further solving the variance:

$\frac{1}{\text{range}}\int_{\text{lower}}^{\text{upper}} x^2 d x = \frac{1}{\text{range}}[\frac{x^{2+1}}{2+1}]^\text{upper}_\text{lower} = \frac{1}{\text{range}}(\frac{upper^3}{3} - \frac{lower^3}{3})$

Then the Signal RMS

$x_{rms} = \sqrt{\text{Variance}} = \text{Standard Deviation}$

Then find the Peak Factor, using $x_{peak} = \frac{\text{range}}{2}$

$\text{Peak Factor} = P_F=\frac{x_{p e a k}}{x_{r m s}}$

Then find the quantization noise’s $N_{rms}$

We assume the quantization noise also follow the uniform distribution.

Same integral method but the range is from +step/2 to -step/2

$N_{rms} = \int_{\frac{-\text{step}}{2}}^{\frac{\text{step}}{2}} x p(x) d x$

$N_{rms} = \frac{\text{Quantization step/2}}{\text{Peak Factor}}$

Finally the SNR (Signal-to-Noise ratio):

$\text{SNR} = \frac{x_{rms}}{N_{rms}}$

And turn it in dB.

$\text{SNR in dB} = 20\log(\frac{x_{rms}}{N_{rms}})$