Overview of CNN - Data Augmentation
CNN - Data Augmentation
Data augmentation is a strategy that enables practitioners to significantly increase the diversity of data available for training models, without actually collecting new data. Data augmentation techniques such as cropping, padding, and horizontal flipping are commonly used to train large neural networks.
What is Data Augmentation
In deep learning, the more training data will result the better model.
- If we have less data, it is more likely to result a overfitted model.
We can Improve our model accuracy by Data Augmentation.
In short, Data Augmentation is a preprocessing technique to help us to generate more image variations to both avoid overfitting and increase accuracy of the model.
Variations include:
- Cropping
- Padding
- Flipping Horizontal and vertical
- Sheering
- Rotation
- Zooming
- Shifting Horizontal and vertical
- Note the Black space will be filled automatically (no black/ empty space)
Original Image :
Generated Image :
Benefits of Data Augmentation
- Require much less effort in creating a dataset (make small dataset into huge dataset)
- Reduces overfitting due to the increased variety
Disadvantage of Data Augmentation
- GPU is not support for Data Augmentation (CPU only)
- Might need a long time to generate images
Some Code Examples
Data Augmentation with Keras
This is a minimum example.
Import Relevant Library
1 | import keras |
Fitting Model
1 | # Fitting Our Generator |
.fit
is used when the entire training dataset can fit into the memory and no data augmentation is applied..fit_generator
is used when either we have a huge dataset to fit into our memory or when data augmentation needs to be applied.
Data Augmentation with Tensorflow
This Example is more detailed.
tf.keras.preprocessing.image.ImageDataGenerator
Import Library
1 | import tensorflow as tf |
Datagen and Fitting Model
1 | data_augmentation = True |
.fit
is used when the entire training dataset can fit into the memory and no data augmentation is applied..fit_generator
is used when either we have a huge dataset to fit into our memory or when data augmentation needs to be applied.
Why Training become much Slower with Data Augmentation?
It is expected behavior when use data augmentation for your model to train slower. Augmentation flips, rotates and in general transforms an image to enlarge our data set. This is done with CPU which is slower than GPU.
- We use augmentation not for speed but for increased accuracy.
Reference
Tensorflow - Data augmentation