Research Analysis/Data Science

CycleGAN : Unpaired Image-to-Image Translation - Research Analysis

장민스기 2021. 8. 19. 21:26

CycleGAN, proposed by Jun-Yan et al. is a generative adversarial model which is capable of translating images from source to target domain in the absence of paired training data. This was a relatively astonishing result compared to other models, because its core concept was very simple while generating highly realistic images. Therefore CycleGAN could be applied to various tasks and datasets. Click below to read the original paper(Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks).

https://arxiv.org/abs/1703.10593

 

Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs. However, for many tasks, paired training data will not be a

arxiv.org

Unpaired Images

Largely used GAN models which transfered images' style or distribution required paired training data which is very rare and costly. Think about a model which detects an image's edges. To train the model, we would need several random images with paired images of their edges. This kind of paired training dataset can not be found easily, and cannnot benefit from adding more data to enhance its performance. CycleGAN however, is capable of being trained with non-paired images where each group follows their unique distribution. This can be a huge advantage that we become capable of inserting additional data to the training set regardless of pairing.

Example of paired and unpaired data - from original paper

The Cycle

CycleGAN was named as Cycle because its structure contains a cycle between two GAN models which are keep exchanging their output images. Lets assume that a translator named G translates images from domain X to Y, and F translates as its reverse, then it can be expressed as below.

$$G:X \rightarrow Y$$

$$F:Y \rightarrow X$$

We can now train two independent to translate images from each source domain to other domain. However, we can not guarantee that the translation would be semantically correct translation. For example, some wrongly trained generater could translate image's season from winter to summer, but changing the place from mountain to a beach.

Correct seasonal translation! - from original paper

In order to handle the mapping and give some constraints to translation, a cycle consistency loss is added to the total loss. Cycle consistenct loss is added in order to prevent arbitrary translation and preserve the cycle consistency. So, what is a cycle consistancy? If we map an image x from X to Y by passing G, then its result $\hat x$ needs to be mapped to original x by the translator F. This means that both translator G and F should be bijections.

$$F(G(x)) \approx x \qquad \textrm{forward cycle consistency}$$

$$G(F(y)) \approx y \qquad \textrm{backward cycle consistency}$$

Formulation

In order to establish the total loss function which will be used in training, we need to add both the adversarial losses and the cycle consistency losses. 

Adversarial Loss

We need to apply adversarial loss to both translator G and F. For the translator $G:X \rightarrow Y$, there would be a discrminator $D_Y$ and its adversarial loss would be like below.

(1) - Adversarial Loss

The simbol $y~p_{data}(y)$ means that the expectation is calculated through the distribution $p$ which data in domain Y follows. On the upper loss function, generator G would aim to maximize it while the discriminator $D_Y$ would want it to be minimized. The loss function for F and corresponding $D_F$ would be the same.

Cycle Consistency Loss

Only training with the adversarial loss would lead an input image to map into arbitrary target image which is not related to its original. To fix this kind of randomness, we will add a loss which forces the translator to produce images which would likely to be translated back to its original image. The formulated loss would be combination of both forward and backward cycle consistency losses with L1 norm.

(2) - Cycle Consistency Loss

Full Objective

The final combined loss function is formed like below and will be used for training.

(3) - Full Loss Function
(a) - Adversarial Loss (b)(c) - forward/backward cycle consistency loss 

Implementation Details

Network Architecture

For the generative network, researchers used the style transfer network designed by Jonson et al. and network named PatchGAN for the discriminator. PatchGAN was a patch-level discriminator which led to fewer parameters needed and applied to various sized images.

Training Details

Researchers changed the loss function a little bit by replacing the negative log likelihood to a least-squares loss. This change led to a more stable training and valuable results. They also chose to use the generated image history technique by Shrivastava et al. which is explained on recent post of SimGAN.

Results

As introduced above, CycleGAN has the strong advantage to utilize non-paired dataset. Therefore, CycleGAN was trained and compared with other leading models. I'll show some of them and enjoy the remarkable CycleGAN's performance.

Collection style transfer

CycleGAN was able to learn the style inside the entire collection of artworks by many artists. This trained model could transfer any image to a generated picture which resembles paintings by various artists.

Example of Collection Style Transfer - from original paper

Object Transfiguration

The model was also able to transfer images from one visual class to another by changing their textures or colors which appear in training set.

Example of Object Transfiguration - from original paper

 

Photo Generation from Images

By adding one additional loss to the full loss function, CycleGAN could generate a realistic image from the drawings, which seems as if a photo which someone just took. The added loss is called the identity loss which similarly works as a regularizer. By adding identy loss, both translators would try to not touch the image as much as possible which makes them avoid unneccesary changes. This would preserve the original image's feature well.

Examples with/without identity loss - from original paper

Limitations

As explained and exampled much on above, it seems that CycleGAN is versatile among all image translation area. However, there exists its limiatations. The main flaw of CycleGAN is that its capable of changing small translation tasks like colors or textures. As CycleGAN was applied to tasks which required large amount of geometric transformations, it worked quite poorly.

Translation with large geomatric changes - from original paper

Another limit was that the model strongly depends on the distribution of input dataset. As you can see on below example, if the dataset doesn't include specific images(horse with a man) which is slightly out of the overall distribution(wild horses), then the model produces strange image.

Image out of distribution - from original paper

Last limiatation was that CycleGAN didn't perform well compared to other translation models which only can be fed with paired inputs. The ultimate goal of CycleGAN was to surpass the models with paired dataset but couldn't reach that much performance.