uoft-cs/cifar10
Viewer β’ Updated β’ 60k β’ 176k β’ 108
A Denoising Diffusion Probabilistic Model (DDPM) trained on CIFAR-10 for 300 epochs. This model generates 32Γ32 synthetic images across 10 classes (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck).
| Component | Specification |
|---|---|
| Architecture | U-Net with self-attention |
| Parameters | 26.8 M |
| Base channels | 128 |
| Channel multipliers | [1, 2, 2, 2] |
| Attention resolutions | 16Γ16, 8Γ8, 4Γ4 (multi-head=4) |
| ResBlocks per stage | 2 |
| Dropout | 0.1 |
| Normalization | GroupNorm (32 groups) |
| Activation | SiLU |
| Time embedding | Sinusoidal β MLP(128β512β512) |
Input (3Γ32Γ32)
β Init Conv (3β128)
β Down[0]: ResBlocks(128) β 32Γ32 skip
β Down[1]: ResBlocks(256) + SelfAttn β 16Γ16 skip
β Down[2]: ResBlocks(256) + SelfAttn β 8Γ8 skip
β Down[3]: ResBlocks(256) + SelfAttn β 4Γ4 skip
β Mid: ResBlock + SelfAttn + ResBlock
β Up[0]: SkipCat + ResBlocks(256) + SelfAttn β upsample 8Γ8
β Up[1]: SkipCat + ResBlocks(256) + SelfAttn β upsample 16Γ16
β Up[2]: SkipCat + ResBlocks(256) + SelfAttn β upsample 32Γ32
β Up[3]: SkipCat + ResBlocks(128)
β Out: GroupNorm + SiLU + Conv β 3Γ32Γ32
| Setting | Value |
|---|---|
| Dataset | CIFAR-10 (50k) |
| Epochs | 300 |
| Batch size | 256 |
| Optimizer | AdamW |
| Learning rate | 2Γ10β»β΄ |
| Mixed precision | BF16 (AMP) |
| EMA decay | 0.9999 (warmup) |
| Steps | 58,500 |
| Final loss | ~0.0547 |
| Hardware | RTX 5080 16 GB |
import torch
import json
from safetensors.torch import load_file
from config import Config
from model import UNet
from diffusion import GaussianDiffusion
# Load config and model
with open('config.json') as f:
cfg_dict = json.load(f)
cfg = Config()
cfg.model.base_channels = cfg_dict['base_channels']
cfg.model.channel_multipliers = tuple(cfg_dict['channel_multipliers'])
cfg.model.attention_resolutions = tuple(cfg_dict['attention_resolutions'])
# ... (set remaining fields from config.json)
model = UNet(cfg.model)
state_dict = load_file('model.safetensors')
model.load_state_dict(state_dict)
model.eval().cuda()
# Set up diffusion
diff = GaussianDiffusion(
timesteps=1000,
beta_start=1e-4,
beta_end=0.02,
)
# Generate 64 images (8Γ8 grid)
with torch.no_grad():
samples = diff.p_sample_loop(model, (64, 3, 32, 32), device='cuda')
# samples.shape β (64, 3, 32, 32), range [-1, 1]
Training progression β same 8 random seeds tracked across the run:
| Step 500 | Step 30,000 | Step 58,500 (final) |
|---|---|---|
| Early blurry shapes | Semi-recognizable objects | Sharp, diverse CIFAR-10 samples |
@inproceedings{ho2020denoising,
title={Denoising Diffusion Probabilistic Models},
author={Ho, Jonathan and Jain, Ajay and Abbeel, Pieter},
booktitle={Advances in Neural Information Processing Systems},
year={2020}
}