# ๐ŸŒŠ LiquidDiffusion **A novel attention-free image generation model based on Liquid Neural Networks** ## What is this? LiquidDiffusion is a **first-of-its-kind** image generation model that replaces attention with **Parallel CfC (Closed-form Continuous-depth) blocks** from Liquid Neural Network research. No existing paper combines LNNs with image generation โ€” this fills that gap. ### Key Properties - โœ… **Zero attention layers** โ€” fully convolutional + liquid time-gating - โœ… **Fully parallelizable** โ€” no ODE solvers, no sequential scanning, no recurrence - โœ… **Latent space training** โ€” uses pretrained SD-VAE (stabilityai/sd-vae-ft-mse, 83.7M frozen) - โœ… **Fits 16GB VRAM** โ€” tiny config runs 256px at batch=8 on T4 GPU - โœ… **Simple training** โ€” Rectified Flow (MSE velocity prediction, no noise schedule) - โœ… **6 verified datasets** โ€” all tested and working with streaming support ## Quick Start (Colab) 1. Open `LiquidDiffusion_Training.ipynb` in Colab 2. Select GPU runtime (T4) 3. Pick a dataset from the dropdown (default: huggan/AFHQv2 โ€” animal faces) 4. Run all cells โ†’ training starts, samples generated every 500 steps ## Architecture ``` Pixel Image (3ร—256ร—256) โ†’ [Frozen SD-VAE Encode] โ†’ Latent (4ร—32ร—32) โ†’ [LiquidDiffusion U-Net] โ†’ Velocity prediction (4ร—32ร—32) โ†’ [Frozen SD-VAE Decode] โ†’ Generated Image (3ร—256ร—256) ``` Each **LiquidDiffusionBlock** contains: 1. **AdaLN** โ€” timestep conditioning via learned scale/shift 2. **ParallelCfCBlock** โ€” the core liquid neural network layer (CfC Eq.10) 3. **MultiScaleSpatialMix** โ€” 3ร—3+5ร—5+7ร—7 depthwise conv + global pooling (replaces attention) 4. **FeedForward** โ€” channel mixing via 1ร—1 conv ### The ParallelCfC Block ```python # CfC Eq.10 adapted for images: gate = ฯƒ(time_a(t_emb) ยท f(features) - time_b(t_emb)) # liquid time-gating out = gate ยท g(features) + (1 - gate) ยท h(features) # CfC interpolation ฮฑ = exp(-ฮป ยท |t_emb|) # liquid relaxation output = ฮฑ ยท input + (1 - ฮฑ) ยท out # time-aware residual ``` ## Verified Datasets All tested and working (with streaming support): | Dataset | Images | Description | Native Resolution | |---------|--------|-------------|-------------------| | `huggan/AFHQv2` | 16K | Animal faces (cats, dogs, wildlife) | 512ร—512 | | `nielsr/CelebA-faces` | 202K | Celebrity faces | 178ร—218 | | `huggan/flowers-102-categories` | 8K | Flower photographs | Variable | | `reach-vb/pokemon-blip-captions` | 833 | Pokemon illustrations | 1280ร—1280 | | `huggan/anime-faces` | 63K | Anime faces | 64ร—64 | | `Norod78/cartoon-blip-captions` | ~3K | Cartoon characters | 512ร—512 | ## VAE Uses **stabilityai/sd-vae-ft-mse** (83.7M params, frozen during training): - 4 latent channels, 8ร— spatial downscale - PSNR 27.3 on LAION-Aesthetics (excellent reconstruction) - ~160MB VRAM in fp16 - Scaling factor: 0.18215 ## Model Configs | Config | Params | 256px VRAM (w/ VAE) | 512px VRAM | |--------|--------|---------------------|------------| | tiny | ~23M | ~6 GB | ~12 GB | | small | ~69M | ~10 GB | ~20 GB | | base | ~154M | ~16 GB | ~30 GB | ## Training **Objective**: Rectified Flow โ€” simple MSE on velocity ```python x_t = (1 - t) ยท x0 + t ยท noise # linear interpolation v_target = noise - x0 # constant velocity loss = MSE(model(x_t, t), v_target) # that's it! ``` **Sampling**: Euler ODE integration, 25-50 steps ## References | Paper | Contribution | |-------|-------------| | [CfC Networks (Nature MI 2022)](https://arxiv.org/abs/2106.13898) | CfC Eq.10, parallelizable closed-form | | [LTC Networks (AAAI 2021)](https://arxiv.org/abs/2006.04439) | Liquid time-constant ODE | | [LiquidTAD (2024)](https://arxiv.org/abs/2604.18274) | Parallel liquid relaxation | | [USM (CVPR 2025)](https://arxiv.org/abs/2504.13499) | U-Net + SSM for diffusion | | [DiffuSSM (2023)](https://arxiv.org/abs/2311.18257) | SSM replaces attention in diffusion | | [Rectified Flow (ICLR 2023)](https://arxiv.org/abs/2209.03003) | Simple velocity training | ## Files ``` โ”œโ”€โ”€ liquid_diffusion/ โ”‚ โ”œโ”€โ”€ __init__.py โ”‚ โ”œโ”€โ”€ model.py # Full model architecture โ”‚ โ””โ”€โ”€ trainer.py # Trainer + dataset utilities โ”œโ”€โ”€ LiquidDiffusion_Training.ipynb # Complete Colab notebook โ”œโ”€โ”€ test_model.py โ””โ”€โ”€ README.md ``` ## License MIT