StyleGAN2

Paper: Analyzing and Improving the Image Quality of StyleGAN

Key feature:

Goodbye AdaIN
Goodbye Progressive Training

Limitation of StyleGAN

Phase Artifacts are caused by instance normalization (AdaIN)
- The Artifacts appear across all activations
- StyleGAN2 paper pointed out that AdaIN operation normalizes the mean and variance of each feature map separately, thereby potentially destroying any information found in the magnitudes of the features relative to each other.

even when the droplet may not be obvious in the final image, it is present in the intermediate feature maps of the generator. The anomaly starts to appear around 64×64 resolution, is present in all feature maps, and becomes progressively stronger at higher resolutions. The existence of such a consistent artifact is puzzling, as the discriminator should be able to detect it.

We pinpoint the problem to the AdaIN operation that normalizes the mean and variance of each feature map sepa- rately, thereby potentially destroying any information found in the magnitudes of the features relative to each other. We hypothesize that the droplet artifact is a result of the gener- ator intentionally sneaking signal strength information past instance normalization: by creating a strong, localized spike that dominates the statistics, the generator can effectively scale the signal as it likes elsewhere. Our hypothesis is sup- ported by the finding that when the normalization step is removed from the generator, the droplet artifacts disappear completely.

Progressive growing appears to have strong location preference for details like teeth and eyes (they doesnt move not matter how the interpolation is)
- Features stay in one place before quickly moving to the next preferred location
- Because in low resolution training, the position is already determined

Weight Modulation and Demodulation

Goodbye, AdaIN.
AdaIN operator is removed and replaced with the weight modulation and demodulation step
The purpose of instance normalization is to remove the effect of $s$ $s$ from the statistics of the convolution’s output feature maps.
- this goal can be achieved more directly by weight modulation and demodulation
  volution’s output feature maps.
Weight Modulation and Demodulation could solve the droplet artifacts

Interestingly, the original StyleGAN applies bias and noise within the style block, causing their relative impact to be inversely pro- portional to the current style’s magnitudes. We observe that more predictable results are obtained by moving these op- erations outside the style block, where they operate on nor- malized data. Furthermore, we notice that after this change it is sufficient for the normalization and modulation to op- erate on the standard deviation alone (i.e., the mean is not needed). The application of bias, noise, and normalization to the constant input can also be safely removed without ob- servable drawbacks

The modulation scales each input feature map of the convolution based on the incoming style, which can alternatively be implemented by scaling the convolution weights:

$w'_{ijk} = s_i \cdot w_{ijk}$

where:

$w$ and $w'$ are the original and modulated weights respectively
$s_i$ is the scale corresponding to the $i$ th input feature map
$j$ enumerate the output feature maps of the convolution
$k$ enumerate the spatial footprint of the convolution

By assuming each random variable in the input activations has the same probability distribution as the others and all are mutually independent (IID), the output activations after modulation and convolution should have standard deviation of:

$\sigma_{j} = \sqrt{\sum_{i,k} (w'_{ijk})^2}$

i.e., the outputs are scaled by the L2 norm of the corresponding weights. The subsequent normalization aims to restore the outputs back to unit standard deviation.

$w''_{ijk} = w'_{ijk} \bigg/ \sqrt{\sum_{i,k} (w'_{ijk})^2 + \epsilon } \space\space \approxeq w'_{ijk} \bigg/ \sigma_{j}$

where

$\epsilon$ is a small constant to avoid numerical issues.

Now we can adjust weights based on $s$ using the equation $w'_{ijk} = s_i \cdot w_{ijk}$ and $w''_{ijk} = w'_{ijk} \bigg/ \sqrt{\sum_{i,k} (w'_{ijk})^2 + \epsilon}$

Compared to instance normalization, our demodulation technique is weaker because it is based on statistical assumptions about the signal in- stead of actual contents of the feature maps. Similar statis- tical analysis has been extensively used in modern network initializers, but we are not aware of it being pre- viously used as a replacement for data-dependent normal- ization. Our demodulation is also related to weight normalization that performs the same calculation as a part of reparameterizing the weight tensor. Prior work has iden- tified weight normalization as beneficial in the context of GAN training.

# Weight Modulation / Weight Demodulation
class Conv2dWeightMod(nn.Module):
    # demod (demodulate) is flag whether to normalize weights by its standard deviation
    # eps is the ϵ for normalizing
    def __init__(self, in_features: int, out_features: int, kernel_size: int, demod: float = True, eps: float = 1e-8):
        super().__init__()
        self.out_features = out_features
        self.demod = demod
        # P = Floor((K-1)/2)
        self.padding = (kernel_size-1)//2
        self.weight = EqualizedWeight([out_features, in_features, kernel_size, kernel_size])
        self.eps = eps

    def forward(self, x, s):
        b, _, h, w = x.shape
        # Reshape the scales
        s = s[:, None, :, None, None]
        weights = self.weight()[None, :, :, :, :]
        weights = weights * s #[batch_size, out_features, in_features, kernel_size, kernel_size]
        if self.demod:
          ############################################################
            sigma_inv = torch.rsqrt((weights ** 2).sum(dim=(2, 3, 4), keepdim=True) + self.eps)
            weights = weights * sigma_inv
          ############################################################
        x = x.reshape(1, -1, h, w)
        _, _, *ws = weights.shape
        weights = weights.reshape(b * self.out_features, *ws)
        # Use grouped convolution to efficiently 
        # calculate the convolution with sample wise kernel. 
        # i.e. we have a different 
        # kernel (weights) for each sample in the batch
        x = F.conv2d(x, weights, padding=self.padding, groups=b)
        # Reshape x to [batch_size, out_features, height, width] and return
        return x.reshape(-1, self.out_features, h, w)