Image-to-Image Translation with Conditional Adversarial Networks
Paper β’ 1611.07004 β’ Published β’ 2
A conditional GAN implementing the pix2pix framework (Isola et al., 2017) for paired image-to-image translation. This model translates edge maps of handbags into realistic photographic renderings.
Pix2Pix learns a mapping from input condition images to output target images using an adversarial training objective. The generator is supervised by both an adversarial loss (fooling the discriminator) and an L1 reconstruction loss (staying close to the ground truth).
Generator β U-Net
Conv2d + BatchNorm2d + ReLU + MaxPool2d blocks, doubling channels at each stageUpsample (bilinear) + Conv2d blocks with skip connections from the corresponding encoder stageDiscriminator β PatchGAN
BCEWithLogitsLossL_total = L_adversarial + lambda_recon * L_L1
= BCEWithLogitsLoss + 200 * L1Loss
| Parameter | Value |
|---|---|
| Dataset | edges2handbags |
| Image resolution | 256 x 256 |
| Epochs | 50 (checkpoint saved at epoch 59) |
| Batch size | 4 |
| Learning rate | 0.0002 |
| Optimizer | Adam (both G and D) |
| Weight initialization | Normal distribution (mean=0, std=0.02) |
| lambda_recon (L1 weight) | 200 |
| File | Description |
|---|---|
| train.py | Training loop |
| UNet.py | Generator (U-Net) and discriminator architecture |
| utils.py | Helper functions |
| dataset.sh | Downloads the edges2handbags dataset |
| Pix2Pix_Epoch59.pth | Saved generator checkpoint |
import torch
from UNet import UNet
device = "cuda" if torch.cuda.is_available() else "cpu"
gen = UNet(input_dim=3, real_dim=3).to(device)
checkpoint = torch.load("Pix2Pix_Epoch59.pth", map_location=device)
gen.load_state_dict(checkpoint)
gen.eval()
# condition: edge map tensor of shape (B, 3, 256, 256), normalized to [-1, 1]
with torch.no_grad():
fake = gen(condition)
MIT