Vines' Log

Brief Introduction Object Detection - RCNN and YOLO

Created2022-07-07|ML/CV/NLP

CNN is for classification and RCNN is for object detection. The difference between object detection algorithms and classification algorithms is that in detection algorithms, we try to draw a bounding box around the object of interest (localization) to locate it within the image.

ML techniques in Speech and Speaker Recognition

Created2022-05-01|ML/CV/NLP

ML techniques in Speech and Speaker Recognition. Including HMM, DNN-HMM, GMM-UBM, GMM-SVM, i-vector, x-vector, and i-vector/PLDA.

Some Practice Questions on GMM, SVM and K-means

Created2022-02-28|ML/CV/NLP

Some Practice Questions on traditional ML methods. The topics cover SVM, K-means clustering, Gaussian Mixture Modelling (GMM), EM algorithm, and Dimension Reduction approaches.

Common approaches in Dimension Reduction

Created2022-02-28|ML/CV/NLP

Explaining Curse-of-dimensionality problem, Principal Component Analysis (PCA), Fisher Discriminant Analysis (FDA), and Linear Discriminant Analysis (LDA).

Paper Review - AnimeGAN

Created2022-02-11|ML/CV/NLP

Studying image-to-image translation. Overview of 2019 ISICA paper "AnimeGAN - A Novel Lightweight GAN for Photo Animation".

Paper Review - White-box Cartoonization

Created2022-01-21|ML/CV/NLP

Studying image-to-image translation. Overview of 2020 CVPR paper "Learning to Cartoonize Using White-box Cartoon Representations".

Paper Review - CartoonGAN

Created2022-01-21|ML/CV/NLP

Studying image-to-image translation. Overview of 2018 CVPR paper "CartoonGAN- Generative Adversarial Networks for Photo Cartoonizations".

Why Transformer can achieve the same tasks as Bi-LSTM and Seq2Seq models?

Created2022-01-06|ML/CV/NLP

A Seq2Seq model takes a sequence as input and produces a sequence as output. The input sequence and output sequence are not always in the same length. In order words, the length of the output sequence is determined by the model. In problems such as Machine Translation and Speech Recognition, a Seq2Seq model which has an encoder-decoder architecture is needed. The transformer can achieve such tasks because it also has an encoder-decoder architecture. The encoder processes the input sequence into hidden states, and it will provide information for the decoder to predict the output sequence. The transformer uses an Auto-regressive decoder, which uses the previously predicted output as input to generate new predicted output. As illustrated in the figure below, Transformer can determine the length of output and thus solve Seq2Seq problems like Speech Recognition.