DDPM CIFAR-10 Diffusion Model

A Denoising Diffusion Probabilistic Model (DDPM) trained on CIFAR-10 for 300 epochs. This model generates 32Γ—32 synthetic images across 10 classes (airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck).

Model Architecture

Component Specification
Architecture U-Net with self-attention
Parameters 26.8 M
Base channels 128
Channel multipliers [1, 2, 2, 2]
Attention resolutions 16Γ—16, 8Γ—8, 4Γ—4 (multi-head=4)
ResBlocks per stage 2
Dropout 0.1
Normalization GroupNorm (32 groups)
Activation SiLU
Time embedding Sinusoidal β†’ MLP(128β†’512β†’512)

U-Net Data Flow

Input (3Γ—32Γ—32)
  β†’ Init Conv (3β†’128)
  β†’ Down[0]: ResBlocks(128) β†’ 32Γ—32 skip
  β†’ Down[1]: ResBlocks(256) + SelfAttn β†’ 16Γ—16 skip
  β†’ Down[2]: ResBlocks(256) + SelfAttn β†’ 8Γ—8 skip
  β†’ Down[3]: ResBlocks(256) + SelfAttn β†’ 4Γ—4 skip
  β†’ Mid: ResBlock + SelfAttn + ResBlock
  β†’ Up[0]: SkipCat + ResBlocks(256) + SelfAttn β†’ upsample 8Γ—8
  β†’ Up[1]: SkipCat + ResBlocks(256) + SelfAttn β†’ upsample 16Γ—16
  β†’ Up[2]: SkipCat + ResBlocks(256) + SelfAttn β†’ upsample 32Γ—32
  β†’ Up[3]: SkipCat + ResBlocks(128)
  β†’ Out: GroupNorm + SiLU + Conv β†’ 3Γ—32Γ—32

Diffusion Process

  • Forward diffusion: Linear noise schedule, predicting noise Ξ΅ (Ξ΅-prediction) rather than xβ‚€
  • Schedule: Cosine Ξ² schedule from 1e-4 to 0.02 over T=1000 timesteps
  • xβ‚€ clipping: Predicted xβ‚€ is clipped to [-1, 1] before computing posterior mean (prevents numerical explosion)
  • Sampling: DDPM ancestral sampler with EMA shadow weights
  • Objective: Simple MSE loss between predicted and true noise

Training

Setting Value
Dataset CIFAR-10 (50k)
Epochs 300
Batch size 256
Optimizer AdamW
Learning rate 2Γ—10⁻⁴
Mixed precision BF16 (AMP)
EMA decay 0.9999 (warmup)
Steps 58,500
Final loss ~0.0547
Hardware RTX 5080 16 GB

Usage

import torch
import json
from safetensors.torch import load_file

from config import Config
from model import UNet
from diffusion import GaussianDiffusion

# Load config and model
with open('config.json') as f:
    cfg_dict = json.load(f)

cfg = Config()
cfg.model.base_channels = cfg_dict['base_channels']
cfg.model.channel_multipliers = tuple(cfg_dict['channel_multipliers'])
cfg.model.attention_resolutions = tuple(cfg_dict['attention_resolutions'])
# ... (set remaining fields from config.json)

model = UNet(cfg.model)
state_dict = load_file('model.safetensors')
model.load_state_dict(state_dict)
model.eval().cuda()

# Set up diffusion
diff = GaussianDiffusion(
    timesteps=1000,
    beta_start=1e-4,
    beta_end=0.02,
)

# Generate 64 images (8Γ—8 grid)
with torch.no_grad():
    samples = diff.p_sample_loop(model, (64, 3, 32, 32), device='cuda')
    # samples.shape β†’ (64, 3, 32, 32), range [-1, 1]

Samples

Training progression β€” same 8 random seeds tracked across the run:

Step 500 Step 30,000 Step 58,500 (final)
Early blurry shapes Semi-recognizable objects Sharp, diverse CIFAR-10 samples

Limitations

  • Resolution: Fixed 32Γ—32 β€” CIFAR-10 native resolution
  • Class conditioning: This is an unconditional model; no class labels used during training
  • FID: Not evaluated (training-in-progress checkpoint)
  • Artifacts: Some generated samples may have checkerboard artifacts or unnatural colors

Citation

@inproceedings{ho2020denoising,
  title={Denoising Diffusion Probabilistic Models},
  author={Ho, Jonathan and Jain, Ajay and Abbeel, Pieter},
  booktitle={Advances in Neural Information Processing Systems},
  year={2020}
}
Downloads last month
35
Safetensors
Model size
26.8M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train mlvar/ddpm-cifar10