File size: 8,946 Bytes
d223c0a | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 | ---
license: mit
tags:
- image-generation
- flow-matching
- liquid-neural-networks
- mamba
- state-space-models
- physics-informed
- lightweight
- mobile-friendly
---
# π LiquidFlow β Liquid-SSM Flow Matching Image Generator
A **novel lightweight architecture** for image generation that combines:
| Component | Source | Role |
|-----------|--------|------|
| **Liquid Time-Constant Networks** | [Hasani et al. 2020](https://arxiv.org/abs/2006.04439) | Adaptive ODE dynamics via CfC closed-form β bounded by construction |
| **Selective State Space Models** | [Gu & Dao 2023 (Mamba)](https://arxiv.org/abs/2312.00752) | Linear-time long-range context, parallelizable scanning |
| **Zigzag Scanning** | [ZigMa 2024](https://arxiv.org/abs/2403.13802) | 2D spatial awareness through alternating scan patterns |
| **Physics-Informed Loss** | [Wang et al. 2020](https://arxiv.org/abs/2001.04536), [PIDM 2024](https://arxiv.org/abs/2403.14404) | Smoothness + TV regularization for training stability |
| **Rectified Flow Matching** | [Lipman et al. 2022](https://arxiv.org/abs/2210.02747) | ODE-based generation β no noise schedule tuning needed |
## π― Key Properties
- **Trainable on Google Colab free tier** (T4 16GB) and Kaggle
- **Mobile-deployable** β tiny model is only ~6M params (~24MB)
- **No custom CUDA kernels** β pure PyTorch, runs anywhere
- **No training collapse/explosion** β sigmoid gating in Liquid CfC guarantees bounded dynamics
- **No noise schedule tuning** β flow matching uses simple linear interpolation
## π Architecture
```
Noise xβ ~ N(0,I) βββ LiquidFlow v_ΞΈ(xβ, t) βββ Image xβ
β
ββββββββ΄βββββββ
β Patchify β (image β non-overlapping patches)
β + PosEmb β (2D learnable positions)
β + DepthConvβ (local structure preservation)
ββββββββ¬βββββββ
β
ββββββββββββββΌβββββββββββββ
β L Γ LiquidSSM Block β
β ββββββββββββββββββββ β
β β AdaLN (t-cond) β β β DiT-style conditioning
β β Zigzag Scan β β β rotates scan pattern per layer
β β SelectiveSSM β β β Mamba-style, input-dependent A,B,C,Ξ
β β + LiquidCfC β β β CfC gating: Ο(-f_Ο)βh + (1-Ο(-f_Ο))βf_x
β β + FFN β β β GELU feed-forward
β β + Skip Connect β β β U-Net style long skips
β ββββββββββββββββββββ β
ββββββββββββββΌβββββββββββββ
β
ββββββββ΄βββββββ
β DepthConv β (local refinement)
β Unpatchify β (patches β image)
ββββββββ¬βββββββ
β
velocity v_ΞΈ (same shape as input)
```
### Core Innovation: Liquid CfC Cell
Instead of solving the Liquid ODE numerically (sequential, slow):
```
dx/dt = -[1/Ο + f(x,I,t)] * x + f(x,I,t)
```
We use the **Closed-form Continuous-depth (CfC)** solution (parallel, fast, stable):
```python
gate = sigmoid(-f_tau(x, h)) # time-constant gating
new_h = gate * h + (1 - gate) * f_x(x, h) # bounded update
```
The **sigmoid gating guarantees** that hidden states stay bounded β no explosion or collapse possible by construction.
### Dual-Path Processing
Each LiquidSSM Block has two parallel branches:
1. **SSM Branch**: Selective scan (Mamba-style) with zigzag patterns β captures global spatial dependencies
2. **Liquid Branch**: CfC cell β adds continuous-time adaptive dynamics
A learnable mixing coefficient `Ξ±` balances them: `output = Ξ±Β·SSM + (1-Ξ±)Β·Liquid`
## π Model Variants
| Variant | Params | Image Size | Patch | GPU VRAM (bs=16) | Use Case |
|---------|--------|------------|-------|-----------------|----------|
| `tiny` | 5.9M | 128Γ128 | 4 | ~4 GB | Quick experiments, mobile |
| `small` | 13.7M | 128Γ128 | 4 | ~8 GB | Production 128Γ128 |
| `base` | 37.6M | 256Γ256 | 8 | ~12 GB | High quality |
| `512` | 38.1M | 512Γ512 | 16 | ~14 GB | High resolution |
## π Quick Start
### Colab / Kaggle (Recommended)
Open the notebook: **`LiquidFlow_Training.ipynb`**
It has interactive widgets for:
- Dataset selection (CIFAR-10, Flowers-102, CelebA, Fashion-MNIST, AFHQ, custom folder)
- Model size and all hyperparameters
- Auto batch-size adjustment for your GPU
### Command Line
```bash
pip install torch torchvision einops pillow matplotlib tqdm
# Quick test (CIFAR-10 32Γ32)
python liquidflow/train.py --model_size tiny --img_size 32 --dataset cifar10 --epochs 50 --batch_size 64
# Production (Flowers 128Γ128)
python liquidflow/train.py --model_size small --img_size 128 --dataset flowers --epochs 200 --batch_size 16
# Custom images
python liquidflow/train.py --model_size small --img_size 128 --dataset folder --data_dir /path/to/images
```
### Python API
```python
from liquidflow import liquidflow_small, euler_sample, make_grid_image
import torch
model = liquidflow_small(img_size=128) # 13.7M params
# ... after training ...
model.eval()
images = euler_sample(model, (16, 3, 128, 128), num_steps=50, device='cuda')
grid = make_grid_image(images.clamp(-1,1)*0.5+0.5, nrow=4)
grid.save('generated.png')
```
## π¦ File Structure
```
βββ liquidflow/
β βββ __init__.py # Package exports
β βββ model.py # Core architecture (LiquidFlowNet, LiquidCfCCell, SelectiveSSM)
β βββ losses.py # Physics-informed flow matching loss + EMA
β βββ sampling.py # Euler & Heun ODE samplers
β βββ train.py # Full training script with CLI
βββ LiquidFlow_Training.ipynb # π Colab/Kaggle notebook
βββ smoke_test.py # Comprehensive CPU test suite (25 tests)
βββ README.md
```
## π¬ Physics-Informed Loss
```
L = L_flow + Ξ»_smooth Β· L_smooth + Ξ»_tv Β· L_tv
```
| Term | Formula | Purpose |
|------|---------|---------|
| `L_flow` | `βv_ΞΈ(xβ,t) - (xβ-xβ)βΒ²` | Learn straight-line velocity field |
| `L_smooth` | `ββΒ²x_predβΒ²` (Laplacian) | Penalize high-frequency noise |
| `L_tv` | `ββx_predββ` (Total Variation) | Edge-preserving smoothness |
Physics loss is **warmed up** over the first 500 steps.
## π§ͺ Recommended Experiments
| Goal | Dataset | Model | Size | Epochs | Time (T4) |
|------|---------|-------|------|--------|-----------|
| Sanity check | CIFAR-10 | tiny | 32 | 20 | ~5 min |
| Baseline | CIFAR-10 | tiny | 128 | 100 | ~2 hrs |
| Quality | Flowers-102 | small | 128 | 200 | ~4 hrs |
| Faces | CelebA | small | 128 | 50 | ~6 hrs |
| High-res | CelebA | 512 | 512 | 100 | ~12 hrs |
## π± Mobile Export
The notebook includes TorchScript and ONNX export cells. The `tiny` model produces a ~24MB file for on-device inference.
## β
Verified (25/25 smoke tests pass)
- All 4 model variants: forward pass β
- Backward pass: all parameters receive gradients β
- Gradient health: no NaN, no Inf β
- Loss convergence: finite across optimizer steps β
- Individual components: LiquidCfCCell, SelectiveSSM, LiquidSSMBlock β
- Scan patterns: 4 patterns, all invertible β
- Sampling: Euler + Heun produce finite images β
- EMA: apply/restore cycle β
- Checkpoint: save/load round-trip β
- Physics loss: all terms finite and positive β
## π References
1. Hasani et al., "Liquid Time-Constant Networks", AAAI 2021 ([2006.04439](https://arxiv.org/abs/2006.04439))
2. Hasani et al., "Closed-form Continuous-depth Models", Nature MI 2022
3. Gu & Dao, "Mamba: Linear-Time Sequence Modeling", 2023 ([2312.00752](https://arxiv.org/abs/2312.00752))
4. Teng et al., "DiM: Diffusion Mamba", 2024 ([2405.14224](https://arxiv.org/abs/2405.14224))
5. Hu et al., "ZigMa: Zigzag Mamba Diffusion", 2024 ([2403.13802](https://arxiv.org/abs/2403.13802))
6. Lipman et al., "Flow Matching for Generative Modeling", ICLR 2023
7. Raissi et al., "Physics-Informed Neural Networks", JCP 2019 ([1711.10561](https://arxiv.org/abs/1711.10561))
8. Wang et al., "Gradient Pathologies in PINNs", 2020 ([2001.04536](https://arxiv.org/abs/2001.04536))
9. Bastek & Kochmann, "Physics-Informed Diffusion Models", 2024 ([2403.14404](https://arxiv.org/abs/2403.14404))
10. Zhu et al., "Vision Mamba", 2024 ([2401.09417](https://arxiv.org/abs/2401.09417))
## License
MIT |