LiquidFlow / README.md

krystv

Initial release: LiquidFlow architecture + training code + notebook

d223c0a verified 14 days ago

preview code

raw

history blame contribute delete

8.95 kB

metadata

license: mit
tags:
  - image-generation
  - flow-matching
  - liquid-neural-networks
  - mamba
  - state-space-models
  - physics-informed
  - lightweight
  - mobile-friendly

🌊 LiquidFlow — Liquid-SSM Flow Matching Image Generator

A novel lightweight architecture for image generation that combines:

Component	Source	Role
Liquid Time-Constant Networks	Hasani et al. 2020	Adaptive ODE dynamics via CfC closed-form — bounded by construction
Selective State Space Models	Gu & Dao 2023 (Mamba)	Linear-time long-range context, parallelizable scanning
Zigzag Scanning	ZigMa 2024	2D spatial awareness through alternating scan patterns
Physics-Informed Loss	Wang et al. 2020, PIDM 2024	Smoothness + TV regularization for training stability
Rectified Flow Matching	Lipman et al. 2022	ODE-based generation — no noise schedule tuning needed

🎯 Key Properties

Trainable on Google Colab free tier (T4 16GB) and Kaggle
Mobile-deployable — tiny model is only ~~6M params (~~24MB)
No custom CUDA kernels — pure PyTorch, runs anywhere
No training collapse/explosion — sigmoid gating in Liquid CfC guarantees bounded dynamics
No noise schedule tuning — flow matching uses simple linear interpolation

📐 Architecture

Noise x₀ ~ N(0,I)  ──→  LiquidFlow v_θ(xₜ, t)  ──→  Image x₁
                           │
                    ┌──────┴──────┐
                    │  Patchify   │  (image → non-overlapping patches)
                    │  + PosEmb   │  (2D learnable positions)
                    │  + DepthConv│  (local structure preservation)
                    └──────┬──────┘
                           │
              ┌────────────┼────────────┐
              │    L × LiquidSSM Block   │
              │  ┌──────────────────┐    │
              │  │ AdaLN (t-cond)   │    │   ← DiT-style conditioning
              │  │ Zigzag Scan      │    │   ← rotates scan pattern per layer
              │  │ SelectiveSSM     │    │   ← Mamba-style, input-dependent A,B,C,Δ
              │  │ + LiquidCfC      │    │   ← CfC gating: σ(-f_τ)⊙h + (1-σ(-f_τ))⊙f_x
              │  │ + FFN            │    │   ← GELU feed-forward
              │  │ + Skip Connect   │    │   ← U-Net style long skips
              │  └──────────────────┘    │
              └────────────┼────────────┘
                           │
                    ┌──────┴──────┐
                    │  DepthConv  │  (local refinement)
                    │  Unpatchify │  (patches → image)
                    └──────┬──────┘
                           │
                     velocity v_θ (same shape as input)

Core Innovation: Liquid CfC Cell

Instead of solving the Liquid ODE numerically (sequential, slow):

dx/dt = -[1/τ + f(x,I,t)] * x + f(x,I,t)

We use the Closed-form Continuous-depth (CfC) solution (parallel, fast, stable):

gate = sigmoid(-f_tau(x, h))    # time-constant gating
new_h = gate * h + (1 - gate) * f_x(x, h)  # bounded update

The sigmoid gating guarantees that hidden states stay bounded — no explosion or collapse possible by construction.

Dual-Path Processing

Each LiquidSSM Block has two parallel branches:

SSM Branch: Selective scan (Mamba-style) with zigzag patterns → captures global spatial dependencies
Liquid Branch: CfC cell → adds continuous-time adaptive dynamics

A learnable mixing coefficient α balances them: output = α·SSM + (1-α)·Liquid

📊 Model Variants

Variant	Params	Image Size	Patch	GPU VRAM (bs=16)	Use Case
`tiny`	5.9M	128×128	4	~4 GB	Quick experiments, mobile
`small`	13.7M	128×128	4	~8 GB	Production 128×128
`base`	37.6M	256×256	8	~12 GB	High quality
`512`	38.1M	512×512	16	~14 GB	High resolution

🚀 Quick Start

Colab / Kaggle (Recommended)

Open the notebook: LiquidFlow_Training.ipynb

It has interactive widgets for:

Dataset selection (CIFAR-10, Flowers-102, CelebA, Fashion-MNIST, AFHQ, custom folder)
Model size and all hyperparameters
Auto batch-size adjustment for your GPU

Command Line

pip install torch torchvision einops pillow matplotlib tqdm

# Quick test (CIFAR-10 32×32)
python liquidflow/train.py --model_size tiny --img_size 32 --dataset cifar10 --epochs 50 --batch_size 64

# Production (Flowers 128×128)
python liquidflow/train.py --model_size small --img_size 128 --dataset flowers --epochs 200 --batch_size 16

# Custom images
python liquidflow/train.py --model_size small --img_size 128 --dataset folder --data_dir /path/to/images

Python API

from liquidflow import liquidflow_small, euler_sample, make_grid_image
import torch

model = liquidflow_small(img_size=128)  # 13.7M params
# ... after training ...
model.eval()
images = euler_sample(model, (16, 3, 128, 128), num_steps=50, device='cuda')
grid = make_grid_image(images.clamp(-1,1)*0.5+0.5, nrow=4)
grid.save('generated.png')

📦 File Structure

├── liquidflow/
│   ├── __init__.py          # Package exports
│   ├── model.py             # Core architecture (LiquidFlowNet, LiquidCfCCell, SelectiveSSM)
│   ├── losses.py            # Physics-informed flow matching loss + EMA
│   ├── sampling.py          # Euler & Heun ODE samplers
│   └── train.py             # Full training script with CLI
├── LiquidFlow_Training.ipynb  # 📓 Colab/Kaggle notebook
├── smoke_test.py            # Comprehensive CPU test suite (25 tests)
└── README.md

🔬 Physics-Informed Loss

L = L_flow + λ_smooth · L_smooth + λ_tv · L_tv

Term	Formula	Purpose
`L_flow`	`‖v_θ(xₜ,t) - (x₁-x₀)‖²`	Learn straight-line velocity field
`L_smooth`	`‖∇²x_pred‖²` (Laplacian)	Penalize high-frequency noise
`L_tv`	`‖∇x_pred‖₁` (Total Variation)	Edge-preserving smoothness

Physics loss is warmed up over the first 500 steps.

🧪 Recommended Experiments

Goal	Dataset	Model	Size	Epochs	Time (T4)
Sanity check	CIFAR-10	tiny	32	20	~5 min
Baseline	CIFAR-10	tiny	128	100	~2 hrs
Quality	Flowers-102	small	128	200	~4 hrs
Faces	CelebA	small	128	50	~6 hrs
High-res	CelebA	512	512	100	~12 hrs

📱 Mobile Export

The notebook includes TorchScript and ONNX export cells. The tiny model produces a ~24MB file for on-device inference.

✅ Verified (25/25 smoke tests pass)

All 4 model variants: forward pass ✓
Backward pass: all parameters receive gradients ✓
Gradient health: no NaN, no Inf ✓
Loss convergence: finite across optimizer steps ✓
Individual components: LiquidCfCCell, SelectiveSSM, LiquidSSMBlock ✓
Scan patterns: 4 patterns, all invertible ✓
Sampling: Euler + Heun produce finite images ✓
EMA: apply/restore cycle ✓
Checkpoint: save/load round-trip ✓
Physics loss: all terms finite and positive ✓

📚 References

Hasani et al., "Liquid Time-Constant Networks", AAAI 2021 (2006.04439)
Hasani et al., "Closed-form Continuous-depth Models", Nature MI 2022
Gu & Dao, "Mamba: Linear-Time Sequence Modeling", 2023 (2312.00752)
Teng et al., "DiM: Diffusion Mamba", 2024 (2405.14224)
Hu et al., "ZigMa: Zigzag Mamba Diffusion", 2024 (2403.13802)
Lipman et al., "Flow Matching for Generative Modeling", ICLR 2023
Raissi et al., "Physics-Informed Neural Networks", JCP 2019 (1711.10561)
Wang et al., "Gradient Pathologies in PINNs", 2020 (2001.04536)
Bastek & Kochmann, "Physics-Informed Diffusion Models", 2024 (2403.14404)
Zhu et al., "Vision Mamba", 2024 (2401.09417)

License

MIT