π LiquidDiffusion
A novel attention-free image generation model based on Liquid Neural Networks
What is this?
LiquidDiffusion is a first-of-its-kind image generation model that replaces attention with Parallel CfC (Closed-form Continuous-depth) blocks from Liquid Neural Network research. No existing paper combines LNNs with image generation β this fills that gap.
Key Properties
- β Zero attention layers β fully convolutional + liquid time-gating
- β Fully parallelizable β no ODE solvers, no sequential scanning, no recurrence
- β Latent space training β uses pretrained SD-VAE (stabilityai/sd-vae-ft-mse, 83.7M frozen)
- β Fits 16GB VRAM β tiny config runs 256px at batch=8 on T4 GPU
- β Simple training β Rectified Flow (MSE velocity prediction, no noise schedule)
- β 6 verified datasets β all tested and working with streaming support
Quick Start (Colab)
- Open
LiquidDiffusion_Training.ipynbin Colab - Select GPU runtime (T4)
- Pick a dataset from the dropdown (default: huggan/AFHQv2 β animal faces)
- Run all cells β training starts, samples generated every 500 steps
Architecture
Pixel Image (3Γ256Γ256)
β [Frozen SD-VAE Encode] β Latent (4Γ32Γ32)
β [LiquidDiffusion U-Net] β Velocity prediction (4Γ32Γ32)
β [Frozen SD-VAE Decode] β Generated Image (3Γ256Γ256)
Each LiquidDiffusionBlock contains:
- AdaLN β timestep conditioning via learned scale/shift
- ParallelCfCBlock β the core liquid neural network layer (CfC Eq.10)
- MultiScaleSpatialMix β 3Γ3+5Γ5+7Γ7 depthwise conv + global pooling (replaces attention)
- FeedForward β channel mixing via 1Γ1 conv
The ParallelCfC Block
# CfC Eq.10 adapted for images:
gate = Ο(time_a(t_emb) Β· f(features) - time_b(t_emb)) # liquid time-gating
out = gate Β· g(features) + (1 - gate) Β· h(features) # CfC interpolation
Ξ± = exp(-Ξ» Β· |t_emb|) # liquid relaxation
output = Ξ± Β· input + (1 - Ξ±) Β· out # time-aware residual
Verified Datasets
All tested and working (with streaming support):
| Dataset | Images | Description | Native Resolution |
|---|---|---|---|
huggan/AFHQv2 |
16K | Animal faces (cats, dogs, wildlife) | 512Γ512 |
nielsr/CelebA-faces |
202K | Celebrity faces | 178Γ218 |
huggan/flowers-102-categories |
8K | Flower photographs | Variable |
reach-vb/pokemon-blip-captions |
833 | Pokemon illustrations | 1280Γ1280 |
huggan/anime-faces |
63K | Anime faces | 64Γ64 |
Norod78/cartoon-blip-captions |
~3K | Cartoon characters | 512Γ512 |
VAE
Uses stabilityai/sd-vae-ft-mse (83.7M params, frozen during training):
- 4 latent channels, 8Γ spatial downscale
- PSNR 27.3 on LAION-Aesthetics (excellent reconstruction)
- ~160MB VRAM in fp16
- Scaling factor: 0.18215
Model Configs
| Config | Params | 256px VRAM (w/ VAE) | 512px VRAM |
|---|---|---|---|
| tiny | ~23M | ~6 GB | ~12 GB |
| small | ~69M | ~10 GB | ~20 GB |
| base | ~154M | ~16 GB | ~30 GB |
Training
Objective: Rectified Flow β simple MSE on velocity
x_t = (1 - t) Β· x0 + t Β· noise # linear interpolation
v_target = noise - x0 # constant velocity
loss = MSE(model(x_t, t), v_target) # that's it!
Sampling: Euler ODE integration, 25-50 steps
References
| Paper | Contribution |
|---|---|
| CfC Networks (Nature MI 2022) | CfC Eq.10, parallelizable closed-form |
| LTC Networks (AAAI 2021) | Liquid time-constant ODE |
| LiquidTAD (2024) | Parallel liquid relaxation |
| USM (CVPR 2025) | U-Net + SSM for diffusion |
| DiffuSSM (2023) | SSM replaces attention in diffusion |
| Rectified Flow (ICLR 2023) | Simple velocity training |
Files
βββ liquid_diffusion/
β βββ __init__.py
β βββ model.py # Full model architecture
β βββ trainer.py # Trainer + dataset utilities
βββ LiquidDiffusion_Training.ipynb # Complete Colab notebook
βββ test_model.py
βββ README.md
License
MIT