--- license: mit pipeline_tag: unconditional-image-generation tags: - diffusion - rectified-flow - patch-diffusion - anime --- # Waifu Diffusion A 130M-parameter diffusion model trained on 10,000 anime faces (90% monochrome) using **rectified flow**, **patch diffusion**, and **CIELAB color space decoupling**. ## Model Details - **Architecture**: Diffusion Transformer (DiT-B) with Vision RoPE - **Parameters**: 130M - **Training Data**: 10k anime faces (80×80), 90% corrupted to grayscale - **Training Steps**: 1280 epochs × batch 256 - **Sampling**: 50-step Euler integration ### Versions | Model | Details | |-------|---------| | `waifu_diffusion_1280_bs256.safetensors` | Full training (1280 epochs, bs=256) | | `waifu_diffusion_128_bs32.safetensors` | Shallow trained version (128 epochs, bs=32) | ## Quick Start ```python import torch from safetensors.torch import load_file from skimage import color import numpy as np # Load model model = JiT( input_size=80, patch_size=4, in_channels=3, hidden_size=768, depth=12, num_heads=12, num_classes=1 ) state_dict = load_file("waifu_diffusion_1280_bs256.safetensors") model.load_state_dict(state_dict) model.eval() # Generate device = "cuda" model.to(device) with torch.no_grad(): xt = torch.randn((1, 3, 80, 80), device=device) y = torch.zeros(1, dtype=torch.long, device=device) for step in range(50): t = torch.tensor(step / 50, device=device) pred_x1 = model(xt, t, y, top_idx=0, left_idx=0) v = (pred_x1 - xt) / max(1.0 - step / 50, 1e-2) xt = xt + v / 50 # Convert CIELAB → RGB lab = torch.clamp(pred_x1[0], -1, 1).cpu().numpy() L = (lab[0] + 1) * 50 a = lab[1] * 128 b = lab[2] * 128 rgb = color.lab2rgb(np.stack([L, a, b], axis=-1)) ``` ## Key Techniques - **Rectified Flow**: Straight-line paths from noise to data (50 steps vs. 1000s for DDPM) - **CIELAB Decoupling**: Separate luminance from color; mask gradients on monochrome → learn structure from all 10k, color from 1k - **Patch Diffusion**: Random 40×80 px crops act as data augmentation; effectively 10k → ~50k samples - **Vision RoPE**: 2D rotary embeddings for spatial consistency across patches ## Links - **GitHub**: https://github.com/ruwwww/waifu_diffusion - **Blog Post**: [Training a Waifu Diffusion Model](https://ruwwww.github.io/al-folio/blog/2026/waifu-diffusion/) ## Citation ```bibtex @misc{waifu_diffusion_2026, author = {Abdurrahman Izzuddin Al Faruq}, title = {Training a Waifu Diffusion Model with Patch Diffusion and Rectified Flow}, year = {2026}, url = {https://github.com/ruwwww/waifu_diffusion} } ``` ## License MIT