Paper Review - Pix2Pix, CycleGAN
Conditional-GAN
The objective of a conditional GAN:
where tries to minimize this objective against an adversarial that tries to maximize it, i.e.
Pix2Pix
Paper: Image-to-Image Translation with Conditional Adversarial Networks (CVPR 2017)
Official Github: https://github.com/phillipi/pix2pix
Key features of Pix2Pix:
- Requires Paired images for training
Loss function of Pix2Pix
- Not only learn the mapping from input image to output image, but also learn a loss function to train this mapping.
- The loss function is not hand-engineered
If we take a naïve approach and ask the CNN to minimize the Euclidean distance between predicted and ground truth pixels, it will tend to produce blurry results. This is because Euclidean distance is minimized by averaging all plausible outputs, which causes blurring.
It would be highly desirable if we could instead specify only a high-level goal, like “make the output indistinguishable from reality”, and then automatically learn a loss function appropriate for satisfying this goal.
The Objective of Pix2Pix:
where
Previous approaches have found it beneficial to mix the GAN objective with a more traditional loss, such as L2 distance. The discriminator’s job remains unchanged, but the generator is tasked to not only fool the discriminator but also to be near the ground truth output in an L2 sense. The paper found that using L1 distance encourages less blurring.
Architecture of Pix2Pix
Both generator and discriminator use modules of the form Convolution-BatchNorm-ReLu. (LeakyReLu and ReLu)
Generator of Pix2Pix
The Generator used is a small variant of U-Net.
- The U-Net is an encoder-decoder with skip connections between mirrored layers in the encoder and decoder stacks.
Discriminator of Pix2Pix
The Discriminator model is PatchGAN.
- The PatchGAN only penalizes structure at the scale of patches. This discriminator tries to classify if each patch in an image is real or fake. This discriminator run convolutionally across the image, averaging all responses to provide the ultimate output of
- can be much smaller than the full size of the image and still produce high quality results. This is advantageous because a smaller PatchGAN has fewer parameters, runs faster, and can be applied to arbitrarily large images.
- Paper found gives the best result, while lower values generate artifacts
- can be much smaller than the full size of the image and still produce high quality results. This is advantageous because a smaller PatchGAN has fewer parameters, runs faster, and can be applied to arbitrarily large images.
Training Details of Pix2Pix
- Alternate between one gradient descent step on , then one step on . Training to maximize like the original GAN paper
- Objective divided by 2 while optimizing , which slows down the rate at which learns relative to
- Uses Adam optimizer (learning rate = 0.0002, beta1 = 0.5, beta2 = 0.999)
CycleGAN
Paper: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (ICCV 2017)
Official Github: https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix
Key features of CycleGAN:
- Requires Unpaired images for training
- Uses 2 Generators and 2 Discriminators
- Both direct mapping and Inverse mapping
- Given any two unordered image collections and , the algorithm learns to automatically “translate” a image from one domain into the other and vice versa.
- A Cycle-consistency loss is introduced to enforce (and vice versa).
For many tasks, paired training data will not be available, this approach could learn to translate an image from a source domain to a target domain in the absence of paired examples.
Loss function of CycleGAN
Adversarial Loss of CycleGAN
Note there will be another discriminator, and 2 identical generators for the same objective.
The paper first said:
The adversarial loss part in the objective of CycleGAN:
Which is same equation of the normal GAN use. In implementation, it is a BCE loss with logit.
However in the later part the paper said they used the loss from LSGAN for the adversarial loss.
Therefore adversarial loss part in the objective of CycleGAN should be:
where:
- is the label for fake sample
- is the label for real sample
- denotes the value that the Generator wants the Discriminator to believe for a fake sample.
LSGAN in implementation uses a MSE loss (without sigmoid in the discriminator).
Cycle-Consistency Loss of CycleGAN
Cycle-Consistency Loss learns an inverse mapping from the output domain back to the input and checks if the input can be reconstructed. In implementation, it is a L1 loss.
Full Objective of CycleGAN
Therefore the Full Objective of CycleGAN:
- Where , according to the paper
- The is the adversarial loss, the function is depending on which adversarial loss you use.
“I think LSGAN is a more stable loss compared to vanilla GAN. It has a better gradient property. You are free to use LSGAN in your task. Maybe you want to change --lambda_L1
to 10 or 25, as LSGAN’s GAN loss has a larger range compared to vanilla GANs.”
Identity Loss of CycleGAN
For Photo generation from paintings, the Identity Loss encourage the mapping to preserve color composition between the input and output. In implementation, it is a L1 loss.
Without , the generator and are free to change the tint of input images when there is no need to.
So we need not to use this loss when we don’t care the coloring.
We can setup our total loss as such formula so we can tweak the values easily:
Here is an example of Generator loss implementation.
1 | # Summary |
- Since Identity Loss is optional, we can set lambda_identity to 0 when identity loss is not used.
Architecture of CycleGAN
- There are two generators, and
- There are two discriminators, and
- Generator networks are ResNet-9 (U-Net will also give a good result)
- Discriminator networks are PatchGANs (same as Pix2Pix)
- InstanceNorm instead of BatchNorm everywhere
- ReLU used only in the generator
- Reflection padding was used to reduce artifacts
Training Details of CycleGAN
-
Replaced the negative log likelihood objective by a least-squares loss in
- More stable during training and generates higher quality results
-
Batch size of 1 (Could be because 2 discriminators + 2 generators took more VRAM)
-
Adam optimizer (learning rate = 0.0002)
-
Keep the same learning rate for the first 100 epochs and linearly decay the rate to 0 over the next 100 epochs
Limitations of CycleGAN
-
The results are far from uniformly positive
- On translations tasks that involve color and texture changes, the method often succeeds
- On translations tasks that require geometric changes, the method with little success (e.g. dog cat transfiguration). The learned translation degenerates into making minimal changes to the input. This failure might be caused by generator architectures which are tailored for good performance on the appearance changes.
-
CycleGAN is more memory-intensive than pix2pix as it requires two generators and two discriminators.
- simultaneously training two GAN models often converges slowly, resulting in a time-consuming training process.
Problem of Cycle-consistency
- Cycle-consistency assumes that the relationship between the two domains is a bijection, which is often too restrictive. Perfect reconstruction is difficult to achieve, especially when images from one domain have additional information compared to the other domain.
Problem of L1 Loss
- L1 loss is a per-pixel reconstruction metric. This do not reflect human perceptual preferences and can lead to blurry results (Even though L1 performs much less blurry results than L2).