RePaint

RePaint: Inpainting using Denoising Diffusion Probabilistic Models (CVPR 2022)

Code: https://github.com/andreas128/RePaint

Most existing Inpainting approaches train for a certain distribution of masks, which limits their generalization capabilities to unseen mask types. Furthermore, training with pixel-wise and perceptual losses often leads to simple textural extensions towards the missing areas instead of semantically meaningful generation. In this work, we propose RePaint: A Denoising Diffusion Probabilistic Model (DDPM) based inpainting approach that is applicable to even extreme masks. We employ a pretrained unconditional DDPM as the generative prior.

Pretrained unconditional DDPM is used
Only alter the reverse diffusion iterations by
- samping the unmasked regions using the given image information
Modified GLIDE code

Advantages:

Allows our network to generalise to any mask during inference.
Enables our network to learn more semantic generation capabilities since it has a powerful DDPM image synthesis prior.
Work for extreme mask cases

Method

Conditioning on the known region

We put mask on and set it as unknown regions. We denote

Known regions: $(1-m) \odot x$
Unknown regions: $m \odot x$
we can alter the known regions $(1-m) \odot x$ since every reverse step from $x_t$ to $x_{t_1}$ depends solely on $x_t$ , as long as we keep the correct properties of the corresponding distribution.

In the inference stage to a trained unconditional DDPM, the each single reverse step is modified such that the masked $x_{t-1}\sim q$ (denoted as $m \odot x^{\text{known}}_{t-1}$ ) and the predicted unknown part of $x_{t-1}\sim p_\theta$ (denoted as $(1-m)\odot x^{\text{unknown}}_{t-1}$ ) are added together.

$x^{\text{known}}_{t-1} \sim \mathcal{N}(\sqrt{\bar{\alpha}}_tx_0, (1-\bar{\alpha}_t)I)$

$x^{\text{unknown}}_{t-1} \sim \mathcal{N}(\mu_\theta(x_t, t), \sum_\theta(x_t, t))$

$x_{t-1} = m \odot x^{\text{known}}_{t-1} + (1-m)\odot x^{\text{unknown}}_{t-1}$

Resampling

When directly applying the method described in above, we observe that only the content type matches with the known regions.

Although the inpainted region matches the texture of the neighboring region, it is semantically incorrect

DDPM is leveraging on the context of the known region, but not harmonizing it well with the rest of the image

The model predicts $x_{t−1}$ using $x_t$ , which comprises the output of the DDPM and the sample from the known region. However, the sampling of the known pixels using forward process is performed without considering the generated parts of the image, which introduces disharmony.

Although the model tries to harmonize the image again in every step, it can never fully converge because the same issue occurs in the next step. Moreover, in each reverse step, the maximum change to an image declines due to the variance schedule of βt. Thus, the method cannot correct mistakes that lead to disharmonious boundaries in the subsequent steps due to restricted flexibility.

We want our model to consider the generated parts of the image as well. Therefore, the author introduced a resampling approach. The approach make used of the property of DDPM to harmonize the input of the model.

The model needs more time to harmonize the conditional information $x^{\text{known}}_{t-1}$ with the generated information $x^{\text{unknown}}_{t-1}$ in one step before advancing to the next denoising step.
Originially in denoising steps: $x_t \rightarrow x_{t-1}$
we diffuse the output $x_{t-1}$ $x_{t - 1}$ back to $x_t$ $x_{t}$ by sampling from the forward process. Although this operation scales back the output and adds noise, some information merged in the generated region $x^{\text{unknown}}_{t-1}$ $x_{t - 1}^{unknown}$ is still preserved in $x^{\text{unknown}}_{t}$ $x_{t}^{unknown}$ .
- Therefore, the new $x^{\text{unknown}}_{t}$ is more harmonized with $x^{\text{known}}_{t}$ and contains conditional information from it.

But new problem raised. The resampling operation can only harmonize one step, it might not be able to merge the semantic information over the entire denoising process.

we denote the time horizon of this operation as jump length $j$ $j$
- For example, for a chosen jump length of $j = 10$ $j = 10$ , we apply 10 forward transitions before applying 10 reverse transitions.
  - for jump length $j = 1$ , the DDPM is more likely to output a blurry image.
- the resampling also increases the runtime of the reverse diffusion.
- smaller jump lengths $j$ tend to produce blurrier image
- increased number of resamplings $r$ improves the overall image consistency (performance)

The author found there is no visible improvement at slowing down the diffusion process.

Therefore the Resampling method is applied into the schedule, which can be illustrated as code:

t_T = 250
jump_len = 10 #j
jump_n_sample = 10 #r

def get_schedule(t_T = t_T, jump_len = jump_len, jump_n_sample = jump_n_sample):
    jumps = {}
    for j in range(0, t_T - jump_len, jump_len):
        jumps[j] = jump_n_sample - 1

    t = t_T
    ts = []

    while t >= 1:
        t = t-1
        ts.append(t)
		
        #=================================== Schedule for ReSampling
        if jumps.get(t, 0) > 0:
            jumps[t] = jumps[t] - 1

            for _ in range(jump_len):
                t = t + 1
                ts.append(t)
        #===================================
    ts.append(-1)
    _check_times(ts, -1, t_T)
    return ts

def _check_times(times, t_0, t_T):
    # Check end
    assert times[0] > times[1], (times[0], times[1])
    # Check beginning
    assert times[-1] == -1, times[-1]
    # Steplength = 1
    for t_last, t_cur in zip(times[:-1], times[1:]):
        assert abs(t_last - t_cur) == 1, (t_last, t_cur)
    # Value range
    for t in times:
        assert t >= t_0, (t, t_0)
        assert t <= t_T, (t, t_T)

# Get the time schedule with the parameters time T, jump length, and number of resampling
times = get_schedule(t_T = 250, jump_len = 10, jump_n_sample = 10) 

x = random_noise()

for t_last, t_cur in zip(times[:-1], times[1:]): 
    if t_cur < t_last: # Denoising Steps from t to t-1, nothing special
        # Reverse Diffusion (The Inpaint version)
        x = reverse_diffusion(x, t, x_known)
    else: # Harmonize Steps from t-1 back to t <======== ReSampling (New)
        # Apply Forward Diffusion 
        x = forward_diffusion(x, t)

there is no visible improvement at slowing down the diffusion process (Increasing T).

Better performance at applying the larger jump j = 10 length than smaller step length steps

Possible Applications

Face anonymization

RePaint could be used for the anonymization of faces.

For example, one could remove the information about the identity of people shown at public events and hallucinate
artificial faces for data protection.

Super-resolution | Image Upscaling

Limitations

the algorithm might be biased towards the dataset
- since it relies on an unconditional pretrained DDPM
Difficult to apply it real-time
- DDPM optimization process is significantly slower than GAN-based and Autoregressive-based methods