File size: 19,997 Bytes
2ec1243 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 | # GEN AI β Programming Assignment
## Generative Models
---
## Submission Policy
Both questions must be submitted together in a **SINGLE zip file** named:
```
{NAME}_{STUDENT_ID}.zip
```
The zip file must contain all code folders for both questions and one combined PDF report.
**Do NOT include** datasets, model checkpoints, or large binary files.
---
# Question 1 β’ 25 Marks
## Denoising Diffusion Probabilistic Models (DDPM)
> Implement DDPM from scratch: forward/reverse process, training objective, and ControlNet conditioning.
### Environment Setup
Create a conda environment named `ddpm` and install PyTorch:
```bash
conda create --name ddpm python=3.10
conda activate ddpm
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt
```
### Code Structure
```
ddpm_assignment/
βββ 2d_plot_diffusion_todo/ (Task 1)
β βββ ddpm_tutorial.ipynb <-- Main notebook
β βββ dataset.py <-- Swiss-roll, moon, gaussians
β βββ network.py <-- (TODO) Noise prediction network
β βββ ddpm.py <-- (TODO) DDPM pipeline
β
βββ task_1_controlnet/ (Task 2)
β βββ diffusion/
β β βββ unets/
β β β βββ unet_2d_condition.py <-- (TODO) Integrate ControlNet into UNet
β β β βββ unet_2d_blocks.py <-- Basic UNet components
β β βββ controlnet.py <-- (TODO) Implement ControlNet
β β βββ pipeline_controlnet.py <-- Diffusion pipeline with ControlNet
β βββ train.py <-- Training code
β βββ train.sh <-- Hyperparameter script
β βββ inference.ipynb <-- Inference notebook
βββ requirements.txt
```
### Background
Denoising Diffusion Probabilistic Models (DDPMs) are a class of generative models that learn to reverse a gradual noising process. The model is trained to predict the noise added to data at each step, and generates new samples by iteratively denoising from pure Gaussian noise.
A typical DDPM pipeline consists of three components:
- **Forward Process**: Gradually adds Gaussian noise to a data sample over T timesteps, producing a sequence xβ β xβ β β¦ β x_T
- **Reverse Process**: A learned neural network iteratively denoises x_T back to xβ, step by step
- **Training Objective**: The network is trained using a simplified noise-matching loss β predicting the noise Ξ΅ added at each step
---
## Task 1: Simple DDPM Pipeline with Swiss-Roll
In this task, you will implement a DDPM to learn a 2D Swiss-Roll distribution. This toy experiment lets you understand each component of the diffusion pipeline before scaling to images.
After completing your implementation, train the model and evaluate it by running `ddpm_tutorial.ipynb` in the `2d_plot_diffusion_todo` directory.
### TODO
#### 1-1: Build a Noise Prediction Network
Implement the noise prediction network in `network.py`. The network takes a noisy data point and a timestep embedding as input, and predicts the noise Ξ΅ added at that step. It should consist of `TimeLinear` layers with feature dimensions:
```
[dim_in, dim_hids[0], ..., dim_hids[-1], dim_out]
```
- Every `TimeLinear` layer except the final output layer must be followed by a ReLU activation
- The final layer has no activation β it directly outputs the predicted noise
> **⬑ Hint**
> `TimeLinear` is a linear layer that is conditioned on a sinusoidal timestep embedding. The timestep embedding is added to the hidden features before the activation at each layer.
#### 1-2: Construct the Forward and Reverse Process
In `ddpm.py`, implement the three core functions of the DDPM pipeline:
- **`q_sample(x_0, t, noise)`**: The forward process. Given a clean sample xβ and timestep t, return the noised sample x_t using the closed-form formula:
```
x_t = βαΎ±_t Β· xβ + β(1 β αΎ±_t) Β· Ξ΅, where Ξ΅ ~ N(0, I)
```
- **`p_sample(x_t, t)`**: One-step reverse transition. Use the trained network to predict Ξ΅, then compute the denoised estimate of x_{t-1}
- **`p_sample_loop(shape)`**: Full reverse process. Starting from x_T ~ N(0, I), iterate `p_sample()` from t=T down to t=1 and return the final sample xβ
> **⬑ Important**
> Use the pre-computed noise schedule (Ξ±_t, αΎ±_t, Ξ²_t) provided in the starter code. Do not redefine the schedule inside these functions.
#### 1-3: Implement the Training Objective
In `ddpm.py`, implement `compute_loss()`. This function should:
1. Sample a random timestep t uniformly from {1, β¦, T} for each element in the batch
2. Sample noise Ξ΅ ~ N(0, I) of the same shape as the input xβ
3. Compute the noised sample x_t using `q_sample()`
4. Pass x_t and t to the noise prediction network to obtain the predicted noise Ξ΅Μ
5. Return the simplified noise-matching loss: **L = ||Ξ΅ β Ξ΅Μ||Β²**
#### 1-4: Training and Evaluation
Once your implementation is complete, open and run `ddpm_tutorial.ipynb` via Jupyter Notebook. The notebook will automatically train the diffusion model and measure the Chamfer Distance (CD) between 2D particles sampled by the model and particles from the true Swiss-Roll distribution.
**Include in your report:**
- The training loss curve
- The Chamfer Distance (CD) value reported after running the notebook
- A visualization of the sampled 2D particles vs. the real Swiss-Roll distribution
---
## Task 2: ControlNet on Fill50K Dataset
In this task, you will implement ControlNet β a method that adds spatial conditioning (e.g., edge maps) to a pretrained Stable Diffusion model by attaching trainable copied encoder blocks with zero-convolution layers.
### Prerequisites: Hugging Face Setup
Before beginning, set up Hugging Face access to download the pretrained Stable Diffusion model:
- Sign into Hugging Face at https://huggingface.co
- Obtain your Access Token at https://huggingface.co/settings/tokens
- Log in from your terminal:
```bash
$ huggingface-cli login
```
Install the ControlNet environment:
```bash
conda install pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt
```
Verify your setup by generating a test image with Stable Diffusion:
```python
import torch
from diffusers import StableDiffusionPipeline
model_id = "CompVis/stable-diffusion-v1-4"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
image = pipe("a photo of an astronaut riding a horse on mars").images[0]
image.save("test.png")
```
### TODO
#### Task 0: Generate Baseline Images
Using the 5 text prompts in `./task_1_controlnet/data/test_prompts.json`, generate 5 baseline images with the pretrained Stable Diffusion model (without ControlNet). These will serve as your comparison baseline in the report.
#### 2-1: Implement Zero-Convolution
In `diffusion/controlnet.py` (TODO 1), implement the zero-convolution operation. A zero-convolution is a 1Γ1 convolution layer whose weights and biases are both initialized to zero at the start of training. This ensures that ControlNet begins training without disrupting the pretrained Stable Diffusion outputs.
> **⬑ Hint**
> Use `nn.Conv2d(channels, channels, kernel_size=1)` and explicitly set `weight.data` and `bias.data` to zero after initialization.
#### 2-2: Initialize ControlNet from Pretrained UNet
In `diffusion/controlnet.py` (TODO 2), initialize the ControlNet encoder by copying weights from the pretrained UNet encoder blocks. This transfer learning approach allows ControlNet to start from a strong pretrained feature extractor rather than training from scratch.
#### 2-3: Apply Zero-Convolution to Residual Features
In `diffusion/controlnet.py` (TODO 3), apply the zero-convolution layers to the residual feature maps output by each ControlNet encoder block before they are passed to the UNet decoder. Specifically, for each block output h, compute:
```
h_out = ZeroConv(h)
```
#### 2-4: Integrate ControlNet Outputs into UNet
In `diffusion/unets/unet_2d_condition.py` (TODO 4), modify the UNet decoder to add the ControlNet residual features to the corresponding UNet decoder skip connections. Each ControlNet block output is added element-wise to the matching UNet decoder input:
```
decoder_input = decoder_input + controlnet_residual
```
> **⬑ Important**
> Do not apply any additional normalization to the ControlNet residuals before adding them to the UNet features. The zero-convolution already handles the initial scaling.
#### 2-5: Train and Evaluate
Train ControlNet on the Fill50K dataset (automatically downloaded by the `load_dataset()` function in `train.py`) by running:
```bash
$ sh train.sh
```
Then, run `inference.ipynb` to generate images conditioned on 5 different edge maps from `./data/test_conditions`, using the text prompts in `data/test_prompts.json`.
**Include in your report:**
- The 5 baseline images generated by Stable Diffusion (Task 0) with their text prompts
- The 5 condition inputs (edge maps), corresponding text prompts, and ControlNet-generated images
- A brief analysis of each condition: does the generated image accurately follow the edge map?
---
# Question 2 β’ 25 Marks
## Generative Adversarial Networks (GAN)
> Implement a Vanilla GAN on 2D Swiss-Roll data and a DCGAN on MNIST handwritten digits.
### Environment Setup
Create a conda environment named `gan_assignment` and install the required packages:
```bash
conda create --name gan_assignment python=3.10
conda activate gan_assignment
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
pip install -r requirements.txt
```
The `requirements.txt` includes: `numpy`, `matplotlib`, `scipy`, `tqdm`, and `jupyter`.
### Code Structure
```
gan_assignment/
βββ task_1_vanilla_gan/ (Task 1)
β βββ gan_tutorial.ipynb <-- Main notebook
β βββ dataset.py <-- 2D toy dataset definitions
β βββ network.py <-- (TODO) Generator & Discriminator
β βββ gan.py <-- (TODO) GAN training pipeline
β
βββ task_2_dcgan/ (Task 2)
β βββ dcgan_tutorial.ipynb <-- Main notebook
β βββ network.py <-- (TODO) DCGAN architecture
β βββ dcgan.py <-- (TODO) DCGAN training loop
βββ requirements.txt
```
---
## Task 1: Vanilla GAN on 2D Swiss-Roll Data
Implement a fully-connected GAN to learn a 2D Swiss-Roll distribution. This toy experiment gives you hands-on experience with the adversarial training loop before scaling to image generation.
### TODO
#### 1-1: Build the Generator Network
Implement the `Generator` class in `network.py`. The Generator maps a noise vector z to a 2D output point:
- **Input**: noise vector z of shape `(batch_size, latent_dim)`, with `latent_dim = 16` by default
- **Architecture**: fully-connected layers with dimensions `[latent_dim, dim_hids[0], β¦, dim_hids[-1], 2]`
- **Activation**: ReLU after every hidden layer (except the final output layer)
- **Output**: 2D point of shape `(batch_size, 2)` with a Tanh activation on the last layer
> **⬑ Hint**
> Use `nn.Sequential` or `nn.ModuleList` to stack your layers.
#### 1-2: Build the Discriminator Network
Implement the `Discriminator` class in `network.py`. The Discriminator takes a 2D point and outputs a real/fake probability:
- **Input**: a 2D point of shape `(batch_size, 2)`
- **Architecture**: fully-connected layers with dimensions `[2, dim_hids[0], β¦, dim_hids[-1], 1]`
- **Activation**: LeakyReLU (negative slope = 0.2) after every hidden layer
- **Output**: a scalar of shape `(batch_size, 1)` with a Sigmoid activation to produce a probability in [0, 1]
#### 1-3: Implement the GAN Training Step
In `gan.py`, implement the `train_step()` function which performs one full update of both G and D:
**1. Discriminator update:**
- Sample a real batch x from the dataset
- Sample z ~ N(0, I) and generate fake samples: `x_fake = G(z)`
- Compute the discriminator BCE loss:
```
L_D = βE[log D(x_real)] β E[log(1 β D(x_fake.detach()))]
```
- Zero grad on D optimizer, backpropagate, and update D only
**2. Generator update:**
- Sample a new batch of z ~ N(0, I)
- Compute the non-saturating generator loss:
```
L_G = βE[log D(G(z))]
```
- Zero grad on G optimizer, backpropagate, and update G only
> **⬑ Important**
> Always call `.detach()` on `x_fake` before passing it to D during the discriminator update. This stops gradients from flowing back into G during D's update step.
#### 1-4: Implement the Sampling Function
In `gan.py`, implement `sample(G, n_samples, latent_dim, device)`:
- Sample `n_samples` noise vectors z from N(0, I) with shape `(n_samples, latent_dim)`
- Pass through G to get generated 2D points
- Return as a NumPy array of shape `(n_samples, 2)`
- Use `torch.no_grad()` to disable gradient tracking during inference
#### 1-5: Training and Evaluation
Run `gan_tutorial.ipynb`. The notebook trains the GAN for 5000 iterations and reports the Chamfer Distance (CD) between generated and real Swiss-Roll points.
**Include in your report:**
- G and D training loss curves (on the same plot or side-by-side)
- The Chamfer Distance (CD) value
- A scatter plot of generated 2D points vs. real Swiss-Roll data
- Brief analysis (2β3 sentences): did the GAN learn the distribution? Did you observe mode collapse or instability?
---
## Task 2: Deep Convolutional GAN (DCGAN) on MNIST
Implement a DCGAN to generate handwritten digit images. DCGAN replaces fully-connected layers with convolutional layers, significantly improving image generation quality.
### TODO
#### 2-1: Implement the DCGAN Generator
Implement `DCGenerator` in `task_2_dcgan/network.py` using transposed convolutions to upsample from noise to a full image:
- **Input**: noise vector z of shape `(batch_size, latent_dim, 1, 1)`, where `latent_dim = 100`
- Use `ConvTranspose2d` layers to upsample progressively to `(1, 28, 28)`
- **Channel sequence**: `latent_dim β 256 β 128 β 64 β 1`
- Apply `BatchNorm2d + ReLU` after every `ConvTranspose2d` except the last
- Apply Tanh to the final output
> **⬑ Tip**
> `ConvTranspose2d(kernel_size=4, stride=2, padding=1)` doubles spatial resolution. Use `kernel_size=4, stride=1, padding=0` for the first layer to go from 1Γ1 to 4Γ4.
#### 2-2: Implement the DCGAN Discriminator
Implement `DCDiscriminator` in `task_2_dcgan/network.py` using strided convolutions to downsample the input image:
- **Input**: grayscale image of shape `(batch_size, 1, 28, 28)`
- Use `Conv2d` layers to downsample to a single scalar output
- **Channel sequence**: `1 β 64 β 128 β 256 β 1`
- Apply `BatchNorm2d + LeakyReLU` (slope 0.2) after every `Conv2d` except the first and last
- Apply Sigmoid to the final output
> **⬑ Important**
> Do NOT apply BatchNorm to the first layer of the discriminator (raw pixel input) or the last layer. This is standard DCGAN practice for training stability.
#### 2-3: Implement the DCGAN Training Loop
In `task_2_dcgan/dcgan.py`, implement `train_one_epoch()` which iterates over the full MNIST training set for one epoch. For each mini-batch:
**1. Discriminator update:**
- BCE loss on real images (label = 1) β `L_D_real`
- BCE loss on fake images G(z) (label = 0) β `L_D_fake`
- `L_D = L_D_real + L_D_fake` β `zero_grad`, `backward`, `step` D optimizer
**2. Generator update:**
- Generate new fake images and compute: `L_G = BCE(D(G(z)), 1)`
- `zero_grad`, `backward`, `step` G optimizer
#### 2-4: Weight Initialization
Implement `weights_init()` in `task_2_dcgan/network.py` and apply it via `model.apply(weights_init)`:
- `Conv2d` and `ConvTranspose2d`: initialize weights ~ N(0, 0.02)
- `BatchNorm2d`: initialize weights ~ N(1.0, 0.02), bias = 0
- All other layer types: leave unchanged
> **⬑ Hint**
> Use `isinstance(m, nn.Conv2d)` to check layer types. Use `torch.nn.init.normal_()` for weight initialization.
#### 2-5: Training and Evaluation
Run `dcgan_tutorial.ipynb`. The notebook trains DCGAN on MNIST for 20 epochs, shows a 4Γ8 grid of generated digits per epoch, and reports the FrΓ©chet Inception Distance (FID) score.
**Include in your report:**
- G and D training loss curves over all iterations
- A 4Γ8 grid of generated MNIST digits from your final trained model
- The FID score reported by the notebook
- Brief analysis (2β3 sentences): comment on image quality, diversity, and any observed instability
---
# Combined Submission Instructions
> **Both questions β one zip file β one PDF report**
## What to Submit
You will submit everything β both Question 1 (DDPM) and Question 2 (GAN) β in a single zip file. There is no separate submission per question.
### Zip File Structure
Your zip file must follow this exact folder layout:
```
{NAME}_{STUDENT_ID}.zip
βββ ddpm_assignment/
β βββ 2d_plot_diffusion_todo/
β β βββ network.py <-- Your implementation
β β βββ ddpm.py <-- Your implementation
β βββ task_1_controlnet/
β βββ diffusion/
β βββ controlnet.py <-- Your implementation
β βββ unets/
β βββ unet_2d_condition.py <-- Your implementation
β
βββ gan_assignment/
β βββ task_1_vanilla_gan/
β β βββ network.py <-- Your implementation
β β βββ gan.py <-- Your implementation
β βββ task_2_dcgan/
β βββ network.py <-- Your implementation
β βββ dcgan.py <-- Your implementation
β
βββ {NAME}_{STUDENT_ID}.pdf <-- Combined report
```
### Combined PDF Report
Write one single PDF report named `{NAME}_{STUDENT_ID}.pdf` that covers both questions. The report **must not exceed 5 pages** (excluding references). It should contain the following sections in order:
**Section 1 β DDPM (Question 1):**
- Task 1: Training loss curve, CD value, particle visualization, and 2β3 sentence analysis
- Task 2: 5 baseline SD images, 5 ControlNet results (condition + generated), and per-condition analysis
**Section 2 β GAN (Question 2):**
- Task 1: G and D loss curves, CD value, scatter plot of generated vs. real 2D points, and 2β3 sentence analysis
- Task 2: G and D loss curves, 4Γ8 generated MNIST grid, FID score, and 2β3 sentence analysis
### Naming Convention
**Do NOT include in your zip:**
- Datasets or downloaded data folders (MNIST, Swiss-Roll, Fill50K, etc.)
- Model checkpoints (`.pth`, `.ckpt` files)
- Generated image folders
- Pretrained model weights (e.g., the Stable Diffusion checkpoint)
| Item | Format |
|------------|--------------------------------------------------------------|
| Zip file | `{NAME}_{STUDENT_ID}.zip` β e.g. `JOHN_DOE_2024001.zip` |
| PDF report | `{NAME}_{STUDENT_ID}.pdf` β e.g. `JOHN_DOE_2024001.pdf` |
---
## Academic Integrity
You may consult the following reference papers while working on this assignment:
- Ho et al. (2020). *Denoising Diffusion Probabilistic Models.*
- Zhang et al. (2023). *Adding Conditional Control to Text-to-Image Diffusion Models (ControlNet).*
- Goodfellow et al. (2014). *Generative Adversarial Networks.*
- Radford et al. (2015). *Unsupervised Representation Learning with Deep Convolutional GANs (DCGAN).*
> It is strictly forbidden to copy, reformat, or directly reproduce code from online repositories or other students. All submitted code must be your own original implementation. Violations will result in a zero for the entire assignment. |