Paper Review - StyleGANv2
StyleGAN2
Paper: Analyzing and Improving the Image Quality of StyleGAN
Key feature:
- Goodbye AdaIN
- Goodbye Progressive Training
Limitation of StyleGAN
- Phase Artifacts are caused by instance normalization (AdaIN)
- The Artifacts appear across all activations
- StyleGAN2 paper pointed out that AdaIN operation normalizes the mean and variance of each feature map separately, thereby potentially destroying any information found in the magnitudes of the features relative to each other.
even when the droplet may not be obvious in the final image, it is present in the intermediate feature maps of the generator. The anomaly starts to appear around 64×64 resolution, is present in all feature maps, and becomes progressively stronger at higher resolutions. The existence of such a consistent artifact is puzzling, as the discriminator should be able to detect it.
We pinpoint the problem to the AdaIN operation that normalizes the mean and variance of each feature map sepa- rately, thereby potentially destroying any information found in the magnitudes of the features relative to each other. We hypothesize that the droplet artifact is a result of the gener- ator intentionally sneaking signal strength information past instance normalization: by creating a strong, localized spike that dominates the statistics, the generator can effectively scale the signal as it likes elsewhere. Our hypothesis is sup- ported by the finding that when the normalization step is removed from the generator, the droplet artifacts disappear completely.
- Progressive growing appears to have strong location preference for details like teeth and eyes (they doesnt move not matter how the interpolation is)
- Features stay in one place before quickly moving to the next preferred location
- Because in low resolution training, the position is already determined
Weight Modulation and Demodulation
- Goodbye, AdaIN.
- AdaIN operator is removed and replaced with the weight modulation and demodulation step
- The purpose of instance normalization is to remove the effect of from the statistics of the convolution’s output feature maps.
- this goal can be achieved more directly by weight modulation and demodulation
volution’s output feature maps.
- this goal can be achieved more directly by weight modulation and demodulation
- Weight Modulation and Demodulation could solve the droplet artifacts
Interestingly, the original StyleGAN applies bias and noise within the style block, causing their relative impact to be inversely pro- portional to the current style’s magnitudes. We observe that more predictable results are obtained by moving these op- erations outside the style block, where they operate on nor- malized data. Furthermore, we notice that after this change it is sufficient for the normalization and modulation to op- erate on the standard deviation alone (i.e., the mean is not needed). The application of bias, noise, and normalization to the constant input can also be safely removed without ob- servable drawbacks
The modulation scales each input feature map of the convolution based on the incoming style, which can alternatively be implemented by scaling the convolution weights:
where:
- and are the original and modulated weights respectively
- is the scale corresponding to the th input feature map
- enumerate the output feature maps of the convolution
- enumerate the spatial footprint of the convolution
By assuming each random variable in the input activations has the same probability distribution as the others and all are mutually independent (IID), the output activations after modulation and convolution should have standard deviation of:
i.e., the outputs are scaled by the L2 norm of the corresponding weights. The subsequent normalization aims to restore the outputs back to unit standard deviation.
where
- is a small constant to avoid numerical issues.
Now we can adjust weights based on using the equation and
Compared to instance normalization, our demodulation technique is weaker because it is based on statistical assumptions about the signal in- stead of actual contents of the feature maps. Similar statis- tical analysis has been extensively used in modern network initializers, but we are not aware of it being pre- viously used as a replacement for data-dependent normal- ization. Our demodulation is also related to weight normalization that performs the same calculation as a part of reparameterizing the weight tensor. Prior work has iden- tified weight normalization as beneficial in the context of GAN training.
1 | # Weight Modulation / Weight Demodulation |
No Progressive Growing
- Goodbye, Progressive Training
New : Projecting images to latent space (Delatent)
- Find the latent code that reproduces a given image
- Useful for image forensics (determine whether a given image is generated or real)