Brief Introduction Object Detection - RCNN and YOLO
CNN is for classification and RCNN is for object detection. The difference between object detection algorithms and classification algorithms is that in detection algorithms, we try to draw a bounding box around the object of interest (localization) to locate it within the image.
ML techniques in Speech and Speaker Recognition
ML techniques in Speech and Speaker Recognition. Including HMM, DNN-HMM, GMM-UBM, GMM-SVM, i-vector, x-vector, and i-vector/PLDA.
Some Practice Questions on GMM, SVM and K-means
Some Practice Questions on traditional ML methods. The topics cover SVM, K-means clustering, Gaussian Mixture Modelling (GMM), EM algorithm, and Dimension Reduction approaches.
Common approaches in Dimension Reduction
Explaining Curse-of-dimensionality problem, Principal Component Analysis (PCA), Fisher Discriminant Analysis (FDA), and Linear Discriminant Analysis (LDA).
Paper Review - AnimeGAN
Studying image-to-image translation. Overview of 2019 ISICA paper "AnimeGAN - A Novel Lightweight GAN for Photo Animation".
Paper Review - White-box Cartoonization
Studying image-to-image translation. Overview of 2020 CVPR paper "Learning to Cartoonize Using White-box Cartoon Representations".
Paper Review - CartoonGAN
Studying image-to-image translation. Overview of 2018 CVPR paper "CartoonGAN- Generative Adversarial Networks for Photo Cartoonizations".
Why Transformer can achieve the same tasks as Bi-LSTM and Seq2Seq models?
A Seq2Seq model takes a sequence as input and produces a sequence as output. The input sequence and output sequence are not always in the same length. In order words, the length of the output sequence is determined by the model. In problems such as Machine Translation and Speech Recognition, a Seq2Seq model which has an encoder-decoder architecture is needed. The transformer can achieve such tasks because it also has an encoder-decoder architecture. The encoder processes the input sequence into hidden states, and it will provide information for the decoder to predict the output sequence. The transformer uses an Auto-regressive decoder, which uses the previously predicted output as input to generate new predicted output. As illustrated in the figure below, Transformer can determine the length of output and thus solve Seq2Seq problems like Speech Recognition.
Some Practice Questions on common Deep Learning architectures
Some Practice Questions on common Deep Learning architectures. The topics cover Autoencoders, CNN, RNN, GANs, Transformers.
ResNet and DenseNet
Resnet made a shortcut connections using Add, while DenseNet further exploits the effect of shortcut connections using Concat. Example code attached.