krystv
/

LiquidFlow

+---
+license: mit
+tags:
+  - image-generation
+  - flow-matching
+  - liquid-neural-networks
+  - mamba
+  - state-space-models
+  - physics-informed
+  - lightweight
+  - mobile-friendly
+---
+# 🌊 LiquidFlow — Liquid-SSM Flow Matching Image Generator
+A **novel lightweight architecture** for image generation that combines:
+| Component | Source | Role |
+|-----------|--------|------|
+| **Liquid Time-Constant Networks** | [Hasani et al. 2020](https://arxiv.org/abs/2006.04439) | Adaptive ODE dynamics via CfC closed-form — bounded by construction |
+| **Selective State Space Models** | [Gu & Dao 2023 (Mamba)](https://arxiv.org/abs/2312.00752) | Linear-time long-range context, parallelizable scanning |
+| **Zigzag Scanning** | [ZigMa 2024](https://arxiv.org/abs/2403.13802) | 2D spatial awareness through alternating scan patterns |
+| **Physics-Informed Loss** | [Wang et al. 2020](https://arxiv.org/abs/2001.04536), [PIDM 2024](https://arxiv.org/abs/2403.14404) | Smoothness + TV regularization for training stability |
+| **Rectified Flow Matching** | [Lipman et al. 2022](https://arxiv.org/abs/2210.02747) | ODE-based generation — no noise schedule tuning needed |
+## 🎯 Key Properties
+- **Trainable on Google Colab free tier** (T4 16GB) and Kaggle
+- **Mobile-deployable** — tiny model is only ~6M params (~24MB)
+- **No custom CUDA kernels** — pure PyTorch, runs anywhere
+- **No training collapse/explosion** — sigmoid gating in Liquid CfC guarantees bounded dynamics
+- **No noise schedule tuning** — flow matching uses simple linear interpolation
+## 📐 Architecture
+```
+Noise x₀ ~ N(0,I)  ──→  LiquidFlow v_θ(xₜ, t)  ──→  Image x₁
+                           │
+                    ┌──────┴──────┐
+                    │  Patchify   │  (image → non-overlapping patches)
+                    │  + PosEmb   │  (2D learnable positions)
+                    │  + DepthConv│  (local structure preservation)
+                    └──────┬──────┘
+                           │
+              ┌────────────┼────────────┐
+              │    L × LiquidSSM Block   │
+              │  ┌──────────────────┐    │
+              │  │ AdaLN (t-cond)   │    │   ← DiT-style conditioning
+              │  │ Zigzag Scan      │    │   ← rotates scan pattern per layer
+              │  │ SelectiveSSM     │    │   ← Mamba-style, input-dependent A,B,C,Δ
+              │  │ + LiquidCfC      │    │   ← CfC gating: σ(-f_τ)⊙h + (1-σ(-f_τ))⊙f_x
+              │  │ + FFN            │    │   ← GELU feed-forward
+              │  │ + Skip Connect   │    │   ← U-Net style long skips
+              │  └──────────────────┘    │
+              └────────────┼────────────┘
+                           │
+                    ┌──────┴──────┐
+                    │  DepthConv  │  (local refinement)
+                    │  Unpatchify │  (patches → image)
+                    └──────┬──────┘
+                           │
+                     velocity v_θ (same shape as input)
+```
+### Core Innovation: Liquid CfC Cell
+Instead of solving the Liquid ODE numerically (sequential, slow):
+```
+dx/dt = -[1/τ + f(x,I,t)] * x + f(x,I,t)
+```
+We use the **Closed-form Continuous-depth (CfC)** solution (parallel, fast, stable):
+```python
+gate = sigmoid(-f_tau(x, h))    # time-constant gating
+new_h = gate * h + (1 - gate) * f_x(x, h)  # bounded update
+```
+The **sigmoid gating guarantees** that hidden states stay bounded — no explosion or collapse possible by construction.
+### Dual-Path Processing
+Each LiquidSSM Block has two parallel branches:
+1. **SSM Branch**: Selective scan (Mamba-style) with zigzag patterns → captures global spatial dependencies
+2. **Liquid Branch**: CfC cell → adds continuous-time adaptive dynamics
+A learnable mixing coefficient `α` balances them: `output = α·SSM + (1-α)·Liquid`
+## 📊 Model Variants
+| Variant | Params | Image Size | Patch | GPU VRAM (bs=16) | Use Case |
+|---------|--------|------------|-------|-----------------|----------|
+| `tiny` | 5.9M | 128×128 | 4 | ~4 GB | Quick experiments, mobile |
+| `small` | 13.7M | 128×128 | 4 | ~8 GB | Production 128×128 |
+| `base` | 37.6M | 256×256 | 8 | ~12 GB | High quality |
+| `512` | 38.1M | 512×512 | 16 | ~14 GB | High resolution |
+## 🚀 Quick Start
+### Colab / Kaggle (Recommended)
+Open the notebook: **`LiquidFlow_Training.ipynb`**
+It has interactive widgets for:
+- Dataset selection (CIFAR-10, Flowers-102, CelebA, Fashion-MNIST, AFHQ, custom folder)
+- Model size and all hyperparameters
+- Auto batch-size adjustment for your GPU
+### Command Line
+```bash
+pip install torch torchvision einops pillow matplotlib tqdm
+# Quick test (CIFAR-10 32×32)
+python liquidflow/train.py --model_size tiny --img_size 32 --dataset cifar10 --epochs 50 --batch_size 64
+# Production (Flowers 128×128)
+python liquidflow/train.py --model_size small --img_size 128 --dataset flowers --epochs 200 --batch_size 16
+# Custom images
+python liquidflow/train.py --model_size small --img_size 128 --dataset folder --data_dir /path/to/images
+```
+### Python API
+```python
+from liquidflow import liquidflow_small, euler_sample, make_grid_image
+import torch
+model = liquidflow_small(img_size=128)  # 13.7M params
+# ... after training ...
+model.eval()
+images = euler_sample(model, (16, 3, 128, 128), num_steps=50, device='cuda')
+grid = make_grid_image(images.clamp(-1,1)*0.5+0.5, nrow=4)
+grid.save('generated.png')
+```
+## 📦 File Structure
+```
+├── liquidflow/
+│   ├── __init__.py          # Package exports
+│   ├── model.py             # Core architecture (LiquidFlowNet, LiquidCfCCell, SelectiveSSM)
+│   ├── losses.py            # Physics-informed flow matching loss + EMA
+│   ├── sampling.py          # Euler & Heun ODE samplers
+│   └── train.py             # Full training script with CLI
+├── LiquidFlow_Training.ipynb  # 📓 Colab/Kaggle notebook
+├── smoke_test.py            # Comprehensive CPU test suite (25 tests)
+└── README.md
+```
+## 🔬 Physics-Informed Loss
+```
+L = L_flow + λ_smooth · L_smooth + λ_tv · L_tv
+```
+| Term | Formula | Purpose |
+|------|---------|---------|
+| `L_flow` | `‖v_θ(xₜ,t) - (x₁-x₀)‖²` | Learn straight-line velocity field |
+| `L_smooth` | `‖∇²x_pred‖²` (Laplacian) | Penalize high-frequency noise |
+| `L_tv` | `‖∇x_pred‖₁` (Total Variation) | Edge-preserving smoothness |
+Physics loss is **warmed up** over the first 500 steps.
+## 🧪 Recommended Experiments
+| Goal | Dataset | Model | Size | Epochs | Time (T4) |
+|------|---------|-------|------|--------|-----------|
+| Sanity check | CIFAR-10 | tiny | 32 | 20 | ~5 min |
+| Baseline | CIFAR-10 | tiny | 128 | 100 | ~2 hrs |
+| Quality | Flowers-102 | small | 128 | 200 | ~4 hrs |
+| Faces | CelebA | small | 128 | 50 | ~6 hrs |
+| High-res | CelebA | 512 | 512 | 100 | ~12 hrs |
+## 📱 Mobile Export
+The notebook includes TorchScript and ONNX export cells. The `tiny` model produces a ~24MB file for on-device inference.
+## ✅ Verified (25/25 smoke tests pass)
+- All 4 model variants: forward pass ✓
+- Backward pass: all parameters receive gradients ✓
+- Gradient health: no NaN, no Inf ✓
+- Loss convergence: finite across optimizer steps ✓
+- Individual components: LiquidCfCCell, SelectiveSSM, LiquidSSMBlock ✓
+- Scan patterns: 4 patterns, all invertible ✓
+- Sampling: Euler + Heun produce finite images ✓
+- EMA: apply/restore cycle ✓
+- Checkpoint: save/load round-trip ✓
+- Physics loss: all terms finite and positive ✓
+## 📚 References
+1. Hasani et al., "Liquid Time-Constant Networks", AAAI 2021 ([2006.04439](https://arxiv.org/abs/2006.04439))
+2. Hasani et al., "Closed-form Continuous-depth Models", Nature MI 2022
+3. Gu & Dao, "Mamba: Linear-Time Sequence Modeling", 2023 ([2312.00752](https://arxiv.org/abs/2312.00752))
+4. Teng et al., "DiM: Diffusion Mamba", 2024 ([2405.14224](https://arxiv.org/abs/2405.14224))
+5. Hu et al., "ZigMa: Zigzag Mamba Diffusion", 2024 ([2403.13802](https://arxiv.org/abs/2403.13802))
+6. Lipman et al., "Flow Matching for Generative Modeling", ICLR 2023
+7. Raissi et al., "Physics-Informed Neural Networks", JCP 2019 ([1711.10561](https://arxiv.org/abs/1711.10561))
+8. Wang et al., "Gradient Pathologies in PINNs", 2020 ([2001.04536](https://arxiv.org/abs/2001.04536))
+9. Bastek & Kochmann, "Physics-Informed Diffusion Models", 2024 ([2403.14404](https://arxiv.org/abs/2403.14404))
+10. Zhu et al., "Vision Mamba", 2024 ([2401.09417](https://arxiv.org/abs/2401.09417))
+## License
+MIT