Paper Review - AnimeGAN
AnimeGAN
(ISICA 2019)
Paper: AnimeGAN: A Novel Lightweight GAN for Photo Animation
Official Github (Tensorflow implementation): https://github.com/TachibanaYoshino/AnimeGAN
Github (PyTorch implementation): https://github.com/ptran1203/pytorch-animeGAN
Key feature:
- Purposed three loss functions to guide the generator to output better animation visual effects.
- grayscale style loss
- grayscale adversarial loss
- color reconstruction loss
- The use of Huber loss and Loss for YUV format
- The use of depthwise separable convolutions and inverted residual blocks (IRBs) in generator
- Can be trained with unpaired data
- Different learning rate for generator and discriminator
- derived 3 dataset from the original anime dataset
- : Grayscale image of
- : but removed edges
- : Grayscale image of
- The reason is to avoid the influence of the color of the images in on the color of the generated images
Loss functions of AnimeGAN
Adversarial loss of AnimeGAN
- LSGAN adversarial loss is used for .
Content loss of AnimeGAN
Content loss introduced to ensure the resulting images retain semantic content of the input.
AnimeGAN uses a high-level feature map from a VGG network that pre-trained on ImageNet. It can preserve the content of objects.
Where:
- refers to the feature maps of a specific VGG layer.
- the paper used the feature maps in
conv4_4
layer from a VGG network. (Same as CartoonGAN and WBCartoonization)
- the paper used the feature maps in
- is a photo.
- is a fake cartoon image that took a photo as input.
- sparse regularization is used here
Grayscale loss of AnimeGAN
The Gram matrix is used to get more vivid style images. AnimeGAN used Gram matrix to make the generated image have the texture of the anime images instead of the color of the anime images.
Where:
- refers to the feature maps of a specific VGG layer.
- the paper used the feature maps in
conv4_4
layer from a VGG network. (Same as CartoonGAN and WBCartoonization)
- the paper used the feature maps in
- is a photo.
- is a grayscale anime image.
- is a fake cartoon image that took a photo as input.
- sparse regularization is used here
Color reconstruction loss of AnimeGAN
Images are converted from RGB to YUV for the color reconstruction loss.
Where:
- Represents Huber Loss
- loss is used for the channel
- Huber Loss is used for and channels
Total Objective function of AnimeGAN
Generator
Discriminator
- 0.1 scaling factor is applied to avoid the edges of the generated image being too sharp
Total
where the paper set the weight factors:
compared to AnimeGAN with , the images generated by AnimeGAN with have more realistic content but the animation style of the images is not obvious. Therefore, when and , the images generated by AnimeGAN have the satisfactory animated visual effects.
Architecture of AnimeGAN
Refer to the paper’s figure 1.
Training Detail of AnimeGAN
- Initialization phase: Learning rate = 0.0001 for the generator , Adam Optim
- Training phase:
- Generator learning rate = 0.00008 , Adam Optim
- Discriminator learning rate = 0.00016 , Adam Optim
- Training epochs = 100
- Batch size = 4
- Training image size are
Some suggestions from the author’s github:
- since the real photos in the training set are all landscape photos, if you want to stylize the photos with people as the main body, you may as well add at least 3000 photos of people in the training set and retrain to obtain a new model.
- In order to obtain a better face animation effect, when using 2 images as data pairs for training, it is suggested that the faces in the photos and the faces in the anime style data should be consistent in terms of gender as much as possible.
- The generated stylized images will be affected by the overall brightness and tone of the style data, so try not to select the anime images of night as the style data, and it is necessary to make an exposure compensation for the overall style data to promote the consistency of brightness and darkness of the entire style data.
AnimeGAN v2
Official Github (Tensorflow implementation): https://github.com/TachibanaYoshino/AnimeGANv2
Github (PyTorch implementation): https://github.com/bryandlee/animegan2-pytorch
Key feature compare to AnimeGAN:
- AnimeGANv2 added the
total variation loss
in the generator loss.- Solve the problem of high-frequency artifacts in the generated image.
- easy to train and directly achieve the effects in the paper.
- Further reduce the number of parameters of the generator network. (generator size: 8.17 Mb), The lite version has a smaller generator model.
- Use new high-quality style data, which come from BD movies as much as possible.