🌊 LiquidDiffusion

A novel attention-free image generation model based on Liquid Neural Networks

What is this?

LiquidDiffusion is a first-of-its-kind image generation model that replaces attention with Parallel CfC (Closed-form Continuous-depth) blocks from Liquid Neural Network research. No existing paper combines LNNs with image generation — this fills that gap.

Key Properties

✅ Zero attention layers — fully convolutional + liquid time-gating
✅ Fully parallelizable — no ODE solvers, no sequential scanning, no recurrence
✅ Latent space training — uses pretrained SD-VAE (stabilityai/sd-vae-ft-mse, 83.7M frozen)
✅ Fits 16GB VRAM — tiny config runs 256px at batch=8 on T4 GPU
✅ Simple training — Rectified Flow (MSE velocity prediction, no noise schedule)
✅ 6 verified datasets — all tested and working with streaming support

Quick Start (Colab)

Open LiquidDiffusion_Training.ipynb in Colab
Select GPU runtime (T4)
Pick a dataset from the dropdown (default: huggan/AFHQv2 — animal faces)
Run all cells → training starts, samples generated every 500 steps

Architecture

Pixel Image (3×256×256)
    → [Frozen SD-VAE Encode] → Latent (4×32×32)
    → [LiquidDiffusion U-Net] → Velocity prediction (4×32×32)
    → [Frozen SD-VAE Decode] → Generated Image (3×256×256)

Each LiquidDiffusionBlock contains:

AdaLN — timestep conditioning via learned scale/shift
ParallelCfCBlock — the core liquid neural network layer (CfC Eq.10)
MultiScaleSpatialMix — 3×3+5×5+7×7 depthwise conv + global pooling (replaces attention)
FeedForward — channel mixing via 1×1 conv

The ParallelCfC Block

# CfC Eq.10 adapted for images:
gate = σ(time_a(t_emb) · f(features) - time_b(t_emb))   # liquid time-gating
out = gate · g(features) + (1 - gate) · h(features)       # CfC interpolation
α = exp(-λ · |t_emb|)                                     # liquid relaxation
output = α · input + (1 - α) · out                         # time-aware residual

Verified Datasets

All tested and working (with streaming support):

Dataset	Images	Description	Native Resolution
`huggan/AFHQv2`	16K	Animal faces (cats, dogs, wildlife)	512×512
`nielsr/CelebA-faces`	202K	Celebrity faces	178×218
`huggan/flowers-102-categories`	8K	Flower photographs	Variable
`reach-vb/pokemon-blip-captions`	833	Pokemon illustrations	1280×1280
`huggan/anime-faces`	63K	Anime faces	64×64
`Norod78/cartoon-blip-captions`	~3K	Cartoon characters	512×512

VAE

Uses stabilityai/sd-vae-ft-mse (83.7M params, frozen during training):

4 latent channels, 8× spatial downscale
PSNR 27.3 on LAION-Aesthetics (excellent reconstruction)
~160MB VRAM in fp16
Scaling factor: 0.18215

Model Configs

Config	Params	256px VRAM (w/ VAE)	512px VRAM
tiny	~23M	~6 GB	~12 GB
small	~69M	~10 GB	~20 GB
base	~154M	~16 GB	~30 GB

Training

Objective: Rectified Flow — simple MSE on velocity

x_t = (1 - t) · x0 + t · noise     # linear interpolation
v_target = noise - x0                # constant velocity
loss = MSE(model(x_t, t), v_target)  # that's it!

Sampling: Euler ODE integration, 25-50 steps

References

Paper	Contribution
CfC Networks (Nature MI 2022)	CfC Eq.10, parallelizable closed-form
LTC Networks (AAAI 2021)	Liquid time-constant ODE
LiquidTAD (2024)	Parallel liquid relaxation
USM (CVPR 2025)	U-Net + SSM for diffusion
DiffuSSM (2023)	SSM replaces attention in diffusion
Rectified Flow (ICLR 2023)	Simple velocity training

Files

├── liquid_diffusion/
│   ├── __init__.py
│   ├── model.py             # Full model architecture
│   └── trainer.py           # Trainer + dataset utilities
├── LiquidDiffusion_Training.ipynb  # Complete Colab notebook
├── test_model.py
└── README.md

License

MIT