| --- |
| license: mit |
| tags: |
| - image-generation |
| - flow-matching |
| - liquid-neural-networks |
| - mamba |
| - state-space-models |
| - physics-informed |
| - lightweight |
| - mobile-friendly |
| --- |
| |
| # π LiquidFlow β Liquid-SSM Flow Matching Image Generator |
|
|
| A **novel lightweight architecture** for image generation that combines: |
|
|
| | Component | Source | Role | |
| |-----------|--------|------| |
| | **Liquid Time-Constant Networks** | [Hasani et al. 2020](https://arxiv.org/abs/2006.04439) | Adaptive ODE dynamics via CfC closed-form β bounded by construction | |
| | **Selective State Space Models** | [Gu & Dao 2023 (Mamba)](https://arxiv.org/abs/2312.00752) | Linear-time long-range context, parallelizable scanning | |
| | **Zigzag Scanning** | [ZigMa 2024](https://arxiv.org/abs/2403.13802) | 2D spatial awareness through alternating scan patterns | |
| | **Physics-Informed Loss** | [Wang et al. 2020](https://arxiv.org/abs/2001.04536), [PIDM 2024](https://arxiv.org/abs/2403.14404) | Smoothness + TV regularization for training stability | |
| | **Rectified Flow Matching** | [Lipman et al. 2022](https://arxiv.org/abs/2210.02747) | ODE-based generation β no noise schedule tuning needed | |
|
|
| ## π― Key Properties |
|
|
| - **Trainable on Google Colab free tier** (T4 16GB) and Kaggle |
| - **Mobile-deployable** β tiny model is only ~6M params (~24MB) |
| - **No custom CUDA kernels** β pure PyTorch, runs anywhere |
| - **No training collapse/explosion** β sigmoid gating in Liquid CfC guarantees bounded dynamics |
| - **No noise schedule tuning** β flow matching uses simple linear interpolation |
|
|
| ## π Architecture |
|
|
| ``` |
| Noise xβ ~ N(0,I) βββ LiquidFlow v_ΞΈ(xβ, t) βββ Image xβ |
| β |
| ββββββββ΄βββββββ |
| β Patchify β (image β non-overlapping patches) |
| β + PosEmb β (2D learnable positions) |
| β + DepthConvβ (local structure preservation) |
| ββββββββ¬βββββββ |
| β |
| ββββββββββββββΌβββββββββββββ |
| β L Γ LiquidSSM Block β |
| β ββββββββββββββββββββ β |
| β β AdaLN (t-cond) β β β DiT-style conditioning |
| β β Zigzag Scan β β β rotates scan pattern per layer |
| β β SelectiveSSM β β β Mamba-style, input-dependent A,B,C,Ξ |
| β β + LiquidCfC β β β CfC gating: Ο(-f_Ο)βh + (1-Ο(-f_Ο))βf_x |
| β β + FFN β β β GELU feed-forward |
| β β + Skip Connect β β β U-Net style long skips |
| β ββββββββββββββββββββ β |
| ββββββββββββββΌβββββββββββββ |
| β |
| ββββββββ΄βββββββ |
| β DepthConv β (local refinement) |
| β Unpatchify β (patches β image) |
| ββββββββ¬βββββββ |
| β |
| velocity v_ΞΈ (same shape as input) |
| ``` |
|
|
| ### Core Innovation: Liquid CfC Cell |
|
|
| Instead of solving the Liquid ODE numerically (sequential, slow): |
| ``` |
| dx/dt = -[1/Ο + f(x,I,t)] * x + f(x,I,t) |
| ``` |
|
|
| We use the **Closed-form Continuous-depth (CfC)** solution (parallel, fast, stable): |
| ```python |
| gate = sigmoid(-f_tau(x, h)) # time-constant gating |
| new_h = gate * h + (1 - gate) * f_x(x, h) # bounded update |
| ``` |
|
|
| The **sigmoid gating guarantees** that hidden states stay bounded β no explosion or collapse possible by construction. |
|
|
| ### Dual-Path Processing |
|
|
| Each LiquidSSM Block has two parallel branches: |
| 1. **SSM Branch**: Selective scan (Mamba-style) with zigzag patterns β captures global spatial dependencies |
| 2. **Liquid Branch**: CfC cell β adds continuous-time adaptive dynamics |
|
|
| A learnable mixing coefficient `Ξ±` balances them: `output = Ξ±Β·SSM + (1-Ξ±)Β·Liquid` |
|
|
| ## π Model Variants |
|
|
| | Variant | Params | Image Size | Patch | GPU VRAM (bs=16) | Use Case | |
| |---------|--------|------------|-------|-----------------|----------| |
| | `tiny` | 5.9M | 128Γ128 | 4 | ~4 GB | Quick experiments, mobile | |
| | `small` | 13.7M | 128Γ128 | 4 | ~8 GB | Production 128Γ128 | |
| | `base` | 37.6M | 256Γ256 | 8 | ~12 GB | High quality | |
| | `512` | 38.1M | 512Γ512 | 16 | ~14 GB | High resolution | |
|
|
| ## π Quick Start |
|
|
| ### Colab / Kaggle (Recommended) |
|
|
| Open the notebook: **`LiquidFlow_Training.ipynb`** |
| |
| It has interactive widgets for: |
| - Dataset selection (CIFAR-10, Flowers-102, CelebA, Fashion-MNIST, AFHQ, custom folder) |
| - Model size and all hyperparameters |
| - Auto batch-size adjustment for your GPU |
| |
| ### Command Line |
| |
| ```bash |
| pip install torch torchvision einops pillow matplotlib tqdm |
| |
| # Quick test (CIFAR-10 32Γ32) |
| python liquidflow/train.py --model_size tiny --img_size 32 --dataset cifar10 --epochs 50 --batch_size 64 |
| |
| # Production (Flowers 128Γ128) |
| python liquidflow/train.py --model_size small --img_size 128 --dataset flowers --epochs 200 --batch_size 16 |
| |
| # Custom images |
| python liquidflow/train.py --model_size small --img_size 128 --dataset folder --data_dir /path/to/images |
| ``` |
| |
| ### Python API |
| |
| ```python |
| from liquidflow import liquidflow_small, euler_sample, make_grid_image |
| import torch |
| |
| model = liquidflow_small(img_size=128) # 13.7M params |
| # ... after training ... |
| model.eval() |
| images = euler_sample(model, (16, 3, 128, 128), num_steps=50, device='cuda') |
| grid = make_grid_image(images.clamp(-1,1)*0.5+0.5, nrow=4) |
| grid.save('generated.png') |
| ``` |
| |
| ## π¦ File Structure |
| |
| ``` |
| βββ liquidflow/ |
| β βββ __init__.py # Package exports |
| β βββ model.py # Core architecture (LiquidFlowNet, LiquidCfCCell, SelectiveSSM) |
| β βββ losses.py # Physics-informed flow matching loss + EMA |
| β βββ sampling.py # Euler & Heun ODE samplers |
| β βββ train.py # Full training script with CLI |
| βββ LiquidFlow_Training.ipynb # π Colab/Kaggle notebook |
| βββ smoke_test.py # Comprehensive CPU test suite (25 tests) |
| βββ README.md |
| ``` |
| |
| ## π¬ Physics-Informed Loss |
| |
| ``` |
| L = L_flow + Ξ»_smooth Β· L_smooth + Ξ»_tv Β· L_tv |
| ``` |
| |
| | Term | Formula | Purpose | |
| |------|---------|---------| |
| | `L_flow` | `βv_ΞΈ(xβ,t) - (xβ-xβ)βΒ²` | Learn straight-line velocity field | |
| | `L_smooth` | `ββΒ²x_predβΒ²` (Laplacian) | Penalize high-frequency noise | |
| | `L_tv` | `ββx_predββ` (Total Variation) | Edge-preserving smoothness | |
| |
| Physics loss is **warmed up** over the first 500 steps. |
| |
| ## π§ͺ Recommended Experiments |
| |
| | Goal | Dataset | Model | Size | Epochs | Time (T4) | |
| |------|---------|-------|------|--------|-----------| |
| | Sanity check | CIFAR-10 | tiny | 32 | 20 | ~5 min | |
| | Baseline | CIFAR-10 | tiny | 128 | 100 | ~2 hrs | |
| | Quality | Flowers-102 | small | 128 | 200 | ~4 hrs | |
| | Faces | CelebA | small | 128 | 50 | ~6 hrs | |
| | High-res | CelebA | 512 | 512 | 100 | ~12 hrs | |
| |
| ## π± Mobile Export |
| |
| The notebook includes TorchScript and ONNX export cells. The `tiny` model produces a ~24MB file for on-device inference. |
| |
| ## β
Verified (25/25 smoke tests pass) |
| |
| - All 4 model variants: forward pass β |
| - Backward pass: all parameters receive gradients β |
| - Gradient health: no NaN, no Inf β |
| - Loss convergence: finite across optimizer steps β |
| - Individual components: LiquidCfCCell, SelectiveSSM, LiquidSSMBlock β |
| - Scan patterns: 4 patterns, all invertible β |
| - Sampling: Euler + Heun produce finite images β |
| - EMA: apply/restore cycle β |
| - Checkpoint: save/load round-trip β |
| - Physics loss: all terms finite and positive β |
| |
| ## π References |
| |
| 1. Hasani et al., "Liquid Time-Constant Networks", AAAI 2021 ([2006.04439](https://arxiv.org/abs/2006.04439)) |
| 2. Hasani et al., "Closed-form Continuous-depth Models", Nature MI 2022 |
| 3. Gu & Dao, "Mamba: Linear-Time Sequence Modeling", 2023 ([2312.00752](https://arxiv.org/abs/2312.00752)) |
| 4. Teng et al., "DiM: Diffusion Mamba", 2024 ([2405.14224](https://arxiv.org/abs/2405.14224)) |
| 5. Hu et al., "ZigMa: Zigzag Mamba Diffusion", 2024 ([2403.13802](https://arxiv.org/abs/2403.13802)) |
| 6. Lipman et al., "Flow Matching for Generative Modeling", ICLR 2023 |
| 7. Raissi et al., "Physics-Informed Neural Networks", JCP 2019 ([1711.10561](https://arxiv.org/abs/1711.10561)) |
| 8. Wang et al., "Gradient Pathologies in PINNs", 2020 ([2001.04536](https://arxiv.org/abs/2001.04536)) |
| 9. Bastek & Kochmann, "Physics-Informed Diffusion Models", 2024 ([2403.14404](https://arxiv.org/abs/2403.14404)) |
| 10. Zhu et al., "Vision Mamba", 2024 ([2401.09417](https://arxiv.org/abs/2401.09417)) |
| |
| ## License |
| MIT |