Initial release: LiquidFlow architecture + training code + notebook

d223c0a verified 14 days ago

8.95 kB

	---
	license: mit
	tags:
	- image-generation
	- flow-matching
	- liquid-neural-networks
	- mamba
	- state-space-models
	- physics-informed
	- lightweight
	- mobile-friendly
	---

	# 🌊 LiquidFlow — Liquid-SSM Flow Matching Image Generator

	A novel lightweight architecture for image generation that combines:

	\| Component \| Source \| Role \|
	\|-----------\|--------\|------\|
	\| Liquid Time-Constant Networks \| [Hasani et al. 2020](https://arxiv.org/abs/2006.04439) \| Adaptive ODE dynamics via CfC closed-form — bounded by construction \|
	\| Selective State Space Models \| [Gu & Dao 2023 (Mamba)](https://arxiv.org/abs/2312.00752) \| Linear-time long-range context, parallelizable scanning \|
	\| Zigzag Scanning \| [ZigMa 2024](https://arxiv.org/abs/2403.13802) \| 2D spatial awareness through alternating scan patterns \|
	\| Physics-Informed Loss \| [Wang et al. 2020](https://arxiv.org/abs/2001.04536), [PIDM 2024](https://arxiv.org/abs/2403.14404) \| Smoothness + TV regularization for training stability \|
	\| Rectified Flow Matching \| [Lipman et al. 2022](https://arxiv.org/abs/2210.02747) \| ODE-based generation — no noise schedule tuning needed \|

	## 🎯 Key Properties

	- Trainable on Google Colab free tier (T4 16GB) and Kaggle
	- Mobile-deployable — tiny model is only ~6M params (~24MB)
	- No custom CUDA kernels — pure PyTorch, runs anywhere
	- No training collapse/explosion — sigmoid gating in Liquid CfC guarantees bounded dynamics
	- No noise schedule tuning — flow matching uses simple linear interpolation

	## 📐 Architecture

	```
	Noise x₀ ~ N(0,I) ──→ LiquidFlow v_θ(xₜ, t) ──→ Image x₁
	│
	┌──────┴──────┐
	│ Patchify │ (image → non-overlapping patches)
	│ + PosEmb │ (2D learnable positions)
	│ + DepthConv│ (local structure preservation)
	└──────┬──────┘
	│
	┌────────────┼────────────┐
	│ L × LiquidSSM Block │
	│ ┌──────────────────┐ │
	│ │ AdaLN (t-cond) │ │ ← DiT-style conditioning
	│ │ Zigzag Scan │ │ ← rotates scan pattern per layer
	│ │ SelectiveSSM │ │ ← Mamba-style, input-dependent A,B,C,Δ
	│ │ + LiquidCfC │ │ ← CfC gating: σ(-f_τ)⊙h + (1-σ(-f_τ))⊙f_x
	│ │ + FFN │ │ ← GELU feed-forward
	│ │ + Skip Connect │ │ ← U-Net style long skips
	│ └──────────────────┘ │
	└────────────┼────────────┘
	│
	┌──────┴──────┐
	│ DepthConv │ (local refinement)
	│ Unpatchify │ (patches → image)
	└──────┬──────┘
	│
	velocity v_θ (same shape as input)
	```

	### Core Innovation: Liquid CfC Cell

	Instead of solving the Liquid ODE numerically (sequential, slow):
	```
	dx/dt = -[1/τ + f(x,I,t)] * x + f(x,I,t)
	```

	We use the Closed-form Continuous-depth (CfC) solution (parallel, fast, stable):
	```python
	gate = sigmoid(-f_tau(x, h)) # time-constant gating
	new_h = gate * h + (1 - gate) * f_x(x, h) # bounded update
	```

	The sigmoid gating guarantees that hidden states stay bounded — no explosion or collapse possible by construction.

	### Dual-Path Processing

	Each LiquidSSM Block has two parallel branches:
	1. SSM Branch: Selective scan (Mamba-style) with zigzag patterns → captures global spatial dependencies
	2. Liquid Branch: CfC cell → adds continuous-time adaptive dynamics

	A learnable mixing coefficient `α` balances them: `output = α·SSM + (1-α)·Liquid`

	## 📊 Model Variants

	\| Variant \| Params \| Image Size \| Patch \| GPU VRAM (bs=16) \| Use Case \|
	\|---------\|--------\|------------\|-------\|-----------------\|----------\|
	\| `tiny` \| 5.9M \| 128×128 \| 4 \| ~4 GB \| Quick experiments, mobile \|
	\| `small` \| 13.7M \| 128×128 \| 4 \| ~8 GB \| Production 128×128 \|
	\| `base` \| 37.6M \| 256×256 \| 8 \| ~12 GB \| High quality \|
	\| `512` \| 38.1M \| 512×512 \| 16 \| ~14 GB \| High resolution \|

	## 🚀 Quick Start

	### Colab / Kaggle (Recommended)

	Open the notebook: `LiquidFlow_Training.ipynb`

	It has interactive widgets for:
	- Dataset selection (CIFAR-10, Flowers-102, CelebA, Fashion-MNIST, AFHQ, custom folder)
	- Model size and all hyperparameters
	- Auto batch-size adjustment for your GPU

	### Command Line

	```bash
	pip install torch torchvision einops pillow matplotlib tqdm

	# Quick test (CIFAR-10 32×32)
	python liquidflow/train.py --model_size tiny --img_size 32 --dataset cifar10 --epochs 50 --batch_size 64

	# Production (Flowers 128×128)
	python liquidflow/train.py --model_size small --img_size 128 --dataset flowers --epochs 200 --batch_size 16

	# Custom images
	python liquidflow/train.py --model_size small --img_size 128 --dataset folder --data_dir /path/to/images
	```

	### Python API

	```python
	from liquidflow import liquidflow_small, euler_sample, make_grid_image
	import torch

	model = liquidflow_small(img_size=128) # 13.7M params
	# ... after training ...
	model.eval()
	images = euler_sample(model, (16, 3, 128, 128), num_steps=50, device='cuda')
	grid = make_grid_image(images.clamp(-1,1)*0.5+0.5, nrow=4)
	grid.save('generated.png')
	```

	## 📦 File Structure

	```
	├── liquidflow/
	│ ├── __init__.py # Package exports
	│ ├── model.py # Core architecture (LiquidFlowNet, LiquidCfCCell, SelectiveSSM)
	│ ├── losses.py # Physics-informed flow matching loss + EMA
	│ ├── sampling.py # Euler & Heun ODE samplers
	│ └── train.py # Full training script with CLI
	├── LiquidFlow_Training.ipynb # 📓 Colab/Kaggle notebook
	├── smoke_test.py # Comprehensive CPU test suite (25 tests)
	└── README.md
	```

	## 🔬 Physics-Informed Loss

	```
	L = L_flow + λ_smooth · L_smooth + λ_tv · L_tv
	```

	\| Term \| Formula \| Purpose \|
	\|------\|---------\|---------\|
	\| `L_flow` \| `‖v_θ(xₜ,t) - (x₁-x₀)‖²` \| Learn straight-line velocity field \|
	\| `L_smooth` \| `‖∇²x_pred‖²` (Laplacian) \| Penalize high-frequency noise \|
	\| `L_tv` \| `‖∇x_pred‖₁` (Total Variation) \| Edge-preserving smoothness \|

	Physics loss is warmed up over the first 500 steps.

	## 🧪 Recommended Experiments

	\| Goal \| Dataset \| Model \| Size \| Epochs \| Time (T4) \|
	\|------\|---------\|-------\|------\|--------\|-----------\|
	\| Sanity check \| CIFAR-10 \| tiny \| 32 \| 20 \| ~5 min \|
	\| Baseline \| CIFAR-10 \| tiny \| 128 \| 100 \| ~2 hrs \|
	\| Quality \| Flowers-102 \| small \| 128 \| 200 \| ~4 hrs \|
	\| Faces \| CelebA \| small \| 128 \| 50 \| ~6 hrs \|
	\| High-res \| CelebA \| 512 \| 512 \| 100 \| ~12 hrs \|

	## 📱 Mobile Export

	The notebook includes TorchScript and ONNX export cells. The `tiny` model produces a ~24MB file for on-device inference.

	## ✅ Verified (25/25 smoke tests pass)

	- All 4 model variants: forward pass ✓
	- Backward pass: all parameters receive gradients ✓
	- Gradient health: no NaN, no Inf ✓
	- Loss convergence: finite across optimizer steps ✓
	- Individual components: LiquidCfCCell, SelectiveSSM, LiquidSSMBlock ✓
	- Scan patterns: 4 patterns, all invertible ✓
	- Sampling: Euler + Heun produce finite images ✓
	- EMA: apply/restore cycle ✓
	- Checkpoint: save/load round-trip ✓
	- Physics loss: all terms finite and positive ✓

	## 📚 References

	1. Hasani et al., "Liquid Time-Constant Networks", AAAI 2021 ([2006.04439](https://arxiv.org/abs/2006.04439))
	2. Hasani et al., "Closed-form Continuous-depth Models", Nature MI 2022
	3. Gu & Dao, "Mamba: Linear-Time Sequence Modeling", 2023 ([2312.00752](https://arxiv.org/abs/2312.00752))
	4. Teng et al., "DiM: Diffusion Mamba", 2024 ([2405.14224](https://arxiv.org/abs/2405.14224))
	5. Hu et al., "ZigMa: Zigzag Mamba Diffusion", 2024 ([2403.13802](https://arxiv.org/abs/2403.13802))
	6. Lipman et al., "Flow Matching for Generative Modeling", ICLR 2023
	7. Raissi et al., "Physics-Informed Neural Networks", JCP 2019 ([1711.10561](https://arxiv.org/abs/1711.10561))
	8. Wang et al., "Gradient Pathologies in PINNs", 2020 ([2001.04536](https://arxiv.org/abs/2001.04536))
	9. Bastek & Kochmann, "Physics-Informed Diffusion Models", 2024 ([2403.14404](https://arxiv.org/abs/2403.14404))
	10. Zhu et al., "Vision Mamba", 2024 ([2401.09417](https://arxiv.org/abs/2401.09417))

	## License
	MIT