Overview of GANs - Architectures

Now I will try to explain the layers used in the Generator model and Discriminator Model, also the optimizers as well.

Discriminator Model

The Discriminator is responsible for Classification.

The Discriminator Model:

Takes an sample as input (From real or generated)
Produce a binary class label of real or fake(generated) as output prediction

The General Layer Pattern of Discriminator goes like that:

$\text{INPUT -> [CONV->LeakyReLU->BatchNormalization]*N -> FC}$

Where $*$ indicates repetition.

The $\text{[CONV->LeakyReLU->BatchNormalization]*N}$ is doing Downsampling.

Conv2D + LeakyReLU

In order to reduce the problem difficulty, we need to Downsample Using Strided Convolutions with Leaky ReLU activation. (don’t use the standard ReLU.) (Don’t use Pooling layers as well)

Best practice is Stride $S = 2$

BatchNormalization

After LeakyReLU activation, we need to standardize layer outputs using BatchNormalization.

Flatten Layer

We need to flatten our image before feeding it to classifier.

Sigmoid Classifier

Sigmoid Activation is used to turn the values between 0 and 1. Since The discriminator must classify it as real (1) or fake (0), we use Sigmoid activation for the last dense layer in the Discriminator.

1 Node is enough for this Dense layer.

Generator Model

The Generator is responsible for Generation.

The Generator Model:

takes a fixed-length random vector as input
and generate a multidimensional vector space after training
forming a compressed representation of the data distribution as output

Sometimes ReLU is used for activation function instead of LeakyReLU. (for generator model only)

The General Layer Pattern of Generator goes like that:

$\text{INPUT -> FC -> [TransCONV->LeakyReLU->BatchNormalization]*N -> Conv2D}$

Where $*$ indicates repetition.

The $\text{[TransCONV->LeakyReLU->BatchNormalization]*N}$ is doing Upsampling.

Fully Connected Layer + LeakyReLU + BatchNormalization

As we mention before, the generator takes latent space as input. We need to use Dense (Fully Connected) Layer + LeakyReLU activation + BatchNormalization to construct our image.

The Number of Nodes of the FC layer is the diamension of image you want to form. (e.g. $7*7*64$ for a 7x7 image)
If in the Dicriminator you downsampled the image to 7x7, you must start from 7x7 and upsample it again.
Add a Reshape Layer with specified diamension of image after BatchNormalization

Conv2DTranspose + LeakyReLU

Transposed Convolutional Layers also known as Deconvolutional Layers.

After we construct our image, use Strided Transpose Convolutional Layers to Upsample the image with Leaky ReLU activation. (don’t use the standard ReLU.)

Best practice is Stride $S = 2$

BatchNormalization

After LeakyReLU activation, we need to standardize layer outputs using BatchNormalization.

Conv2D + Tanh

Tanh activation function change the output value in the range [-1,1].

The output is a three channel image, therefore the Number of Filters for the Conv2D layer is 3.

Optimizer

Loss Function for Discriminator

since the discriminator outputs a probability for a given image between 0 and 1 for fake and real respectively, we can implement the binary cross-entropy loss function.

The discriminator model is trained like any other binary classification deep learning model.

since the discriminator outputs a probability for a given image between 0 and 1 for fake and real respectively, we can implement the binary cross-entropy loss function.

commonly implemented as the binary cross-entropy loss function
best practice is to use Adam as optimizer with a small learning rate and conservative momentum

Loss Function for Generator

The generator is not updated directly and there is no loss for this model. Instead, the discriminator is used to provide a learned or indirect loss function for the generator.