Paper Review - Inpainting with DDPM (RePaint)
RePaint
RePaint: Inpainting using Denoising Diffusion Probabilistic Models (CVPR 2022)
Code: https://github.com/andreas128/RePaint
Most existing Inpainting approaches train for a certain distribution of masks, which limits their generalization capabilities to unseen mask types. Furthermore, training with pixel-wise and perceptual losses often leads to simple textural extensions towards the missing areas instead of semantically meaningful generation. In this work, we propose RePaint: A Denoising Diffusion Probabilistic Model (DDPM) based inpainting approach that is applicable to even extreme masks. We employ a pretrained unconditional DDPM as the generative prior.
- Pretrained unconditional DDPM is used
- Only alter the reverse diffusion iterations by
- samping the unmasked regions using the given image information
- Modified GLIDE code
Advantages:
- Allows our network to generalise to any mask during inference.
- Enables our network to learn more semantic generation capabilities since it has a powerful DDPM image synthesis prior.
- Work for extreme mask cases
Method
Conditioning on the known region
We put mask on and set it as unknown regions. We denote
-
Known regions:
-
Unknown regions:
-
we can alter the known regions since every reverse step from to depends solely on , as long as we keep the correct properties of the corresponding distribution.
In the inference stage to a trained unconditional DDPM, the each single reverse step is modified such that the masked (denoted as ) and the predicted unknown part of (denoted as ) are added together.
Resampling
When directly applying the method described in above, we observe that only the content type matches with the known regions.
- Although the inpainted region matches the texture of the neighboring region, it is semantically incorrect
- DDPM is leveraging on the context of the known region, but not harmonizing it well with the rest of the image
The model predicts using , which comprises the output of the DDPM and the sample from the known region. However, the sampling of the known pixels using forward process is performed without considering the generated parts of the image, which introduces disharmony.
Although the model tries to harmonize the image again in every step, it can never fully converge because the same issue occurs in the next step. Moreover, in each reverse step, the maximum change to an image declines due to the variance schedule of βt. Thus, the method cannot correct mistakes that lead to disharmonious boundaries in the subsequent steps due to restricted flexibility.
We want our model to consider the generated parts of the image as well. Therefore, the author introduced a resampling approach. The approach make used of the property of DDPM to harmonize the input of the model.
- The model needs more time to harmonize the conditional information with the generated information in one step before advancing to the next denoising step.
- Originially in denoising steps:
- we diffuse the output back to by sampling from the forward process. Although this operation scales back the output and adds noise, some information merged in the generated region is still preserved in .
- Therefore, the new is more harmonized with and contains conditional information from it.
But new problem raised. The resampling operation can only harmonize one step, it might not be able to merge the semantic information over the entire denoising process.
- we denote the time horizon of this operation as jump length
- For example, for a chosen jump length of , we apply 10 forward transitions before applying 10 reverse transitions.
- for jump length , the DDPM is more likely to output a blurry image.
- the resampling also increases the runtime of the reverse diffusion.
- smaller jump lengths tend to produce blurrier image
- increased number of resamplings improves the overall image consistency (performance)
- For example, for a chosen jump length of , we apply 10 forward transitions before applying 10 reverse transitions.
The author found there is no visible improvement at slowing down the diffusion process.
Therefore the Resampling method is applied into the schedule, which can be illustrated as code:
1 | t_T = 250 |
1 | # Get the time schedule with the parameters time T, jump length, and number of resampling |
- there is no visible improvement at slowing down the diffusion process (Increasing T).
- Better performance at applying the larger jump j = 10 length than smaller step length steps
Possible Applications
Face anonymization
RePaint could be used for the anonymization of faces.
- For example, one could remove the information about the identity of people shown at public events and hallucinate
artificial faces for data protection.
Super-resolution | Image Upscaling
Limitations
- the algorithm might be biased towards the dataset
- since it relies on an unconditional pretrained DDPM
- Difficult to apply it real-time
- DDPM optimization process is significantly slower than GAN-based and Autoregressive-based methods