krystv commited on
Commit
d223c0a
Β·
verified Β·
1 Parent(s): 8f8313c

Initial release: LiquidFlow architecture + training code + notebook

Browse files
Files changed (1) hide show
  1. README.md +206 -0
README.md ADDED
@@ -0,0 +1,206 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - image-generation
5
+ - flow-matching
6
+ - liquid-neural-networks
7
+ - mamba
8
+ - state-space-models
9
+ - physics-informed
10
+ - lightweight
11
+ - mobile-friendly
12
+ ---
13
+
14
+ # 🌊 LiquidFlow β€” Liquid-SSM Flow Matching Image Generator
15
+
16
+ A **novel lightweight architecture** for image generation that combines:
17
+
18
+ | Component | Source | Role |
19
+ |-----------|--------|------|
20
+ | **Liquid Time-Constant Networks** | [Hasani et al. 2020](https://arxiv.org/abs/2006.04439) | Adaptive ODE dynamics via CfC closed-form β€” bounded by construction |
21
+ | **Selective State Space Models** | [Gu & Dao 2023 (Mamba)](https://arxiv.org/abs/2312.00752) | Linear-time long-range context, parallelizable scanning |
22
+ | **Zigzag Scanning** | [ZigMa 2024](https://arxiv.org/abs/2403.13802) | 2D spatial awareness through alternating scan patterns |
23
+ | **Physics-Informed Loss** | [Wang et al. 2020](https://arxiv.org/abs/2001.04536), [PIDM 2024](https://arxiv.org/abs/2403.14404) | Smoothness + TV regularization for training stability |
24
+ | **Rectified Flow Matching** | [Lipman et al. 2022](https://arxiv.org/abs/2210.02747) | ODE-based generation β€” no noise schedule tuning needed |
25
+
26
+ ## 🎯 Key Properties
27
+
28
+ - **Trainable on Google Colab free tier** (T4 16GB) and Kaggle
29
+ - **Mobile-deployable** β€” tiny model is only ~6M params (~24MB)
30
+ - **No custom CUDA kernels** β€” pure PyTorch, runs anywhere
31
+ - **No training collapse/explosion** β€” sigmoid gating in Liquid CfC guarantees bounded dynamics
32
+ - **No noise schedule tuning** β€” flow matching uses simple linear interpolation
33
+
34
+ ## πŸ“ Architecture
35
+
36
+ ```
37
+ Noise xβ‚€ ~ N(0,I) ──→ LiquidFlow v_ΞΈ(xβ‚œ, t) ──→ Image x₁
38
+ β”‚
39
+ β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”
40
+ β”‚ Patchify β”‚ (image β†’ non-overlapping patches)
41
+ β”‚ + PosEmb β”‚ (2D learnable positions)
42
+ β”‚ + DepthConvβ”‚ (local structure preservation)
43
+ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
44
+ β”‚
45
+ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
46
+ β”‚ L Γ— LiquidSSM Block β”‚
47
+ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
48
+ β”‚ β”‚ AdaLN (t-cond) β”‚ β”‚ ← DiT-style conditioning
49
+ β”‚ β”‚ Zigzag Scan β”‚ β”‚ ← rotates scan pattern per layer
50
+ β”‚ β”‚ SelectiveSSM β”‚ β”‚ ← Mamba-style, input-dependent A,B,C,Ξ”
51
+ β”‚ β”‚ + LiquidCfC β”‚ β”‚ ← CfC gating: Οƒ(-f_Ο„)βŠ™h + (1-Οƒ(-f_Ο„))βŠ™f_x
52
+ β”‚ β”‚ + FFN β”‚ β”‚ ← GELU feed-forward
53
+ β”‚ β”‚ + Skip Connect β”‚ β”‚ ← U-Net style long skips
54
+ β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
55
+ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
56
+ β”‚
57
+ β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”
58
+ β”‚ DepthConv β”‚ (local refinement)
59
+ β”‚ Unpatchify β”‚ (patches β†’ image)
60
+ β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
61
+ β”‚
62
+ velocity v_ΞΈ (same shape as input)
63
+ ```
64
+
65
+ ### Core Innovation: Liquid CfC Cell
66
+
67
+ Instead of solving the Liquid ODE numerically (sequential, slow):
68
+ ```
69
+ dx/dt = -[1/Ο„ + f(x,I,t)] * x + f(x,I,t)
70
+ ```
71
+
72
+ We use the **Closed-form Continuous-depth (CfC)** solution (parallel, fast, stable):
73
+ ```python
74
+ gate = sigmoid(-f_tau(x, h)) # time-constant gating
75
+ new_h = gate * h + (1 - gate) * f_x(x, h) # bounded update
76
+ ```
77
+
78
+ The **sigmoid gating guarantees** that hidden states stay bounded β€” no explosion or collapse possible by construction.
79
+
80
+ ### Dual-Path Processing
81
+
82
+ Each LiquidSSM Block has two parallel branches:
83
+ 1. **SSM Branch**: Selective scan (Mamba-style) with zigzag patterns β†’ captures global spatial dependencies
84
+ 2. **Liquid Branch**: CfC cell β†’ adds continuous-time adaptive dynamics
85
+
86
+ A learnable mixing coefficient `Ξ±` balances them: `output = Ξ±Β·SSM + (1-Ξ±)Β·Liquid`
87
+
88
+ ## πŸ“Š Model Variants
89
+
90
+ | Variant | Params | Image Size | Patch | GPU VRAM (bs=16) | Use Case |
91
+ |---------|--------|------------|-------|-----------------|----------|
92
+ | `tiny` | 5.9M | 128Γ—128 | 4 | ~4 GB | Quick experiments, mobile |
93
+ | `small` | 13.7M | 128Γ—128 | 4 | ~8 GB | Production 128Γ—128 |
94
+ | `base` | 37.6M | 256Γ—256 | 8 | ~12 GB | High quality |
95
+ | `512` | 38.1M | 512Γ—512 | 16 | ~14 GB | High resolution |
96
+
97
+ ## πŸš€ Quick Start
98
+
99
+ ### Colab / Kaggle (Recommended)
100
+
101
+ Open the notebook: **`LiquidFlow_Training.ipynb`**
102
+
103
+ It has interactive widgets for:
104
+ - Dataset selection (CIFAR-10, Flowers-102, CelebA, Fashion-MNIST, AFHQ, custom folder)
105
+ - Model size and all hyperparameters
106
+ - Auto batch-size adjustment for your GPU
107
+
108
+ ### Command Line
109
+
110
+ ```bash
111
+ pip install torch torchvision einops pillow matplotlib tqdm
112
+
113
+ # Quick test (CIFAR-10 32Γ—32)
114
+ python liquidflow/train.py --model_size tiny --img_size 32 --dataset cifar10 --epochs 50 --batch_size 64
115
+
116
+ # Production (Flowers 128Γ—128)
117
+ python liquidflow/train.py --model_size small --img_size 128 --dataset flowers --epochs 200 --batch_size 16
118
+
119
+ # Custom images
120
+ python liquidflow/train.py --model_size small --img_size 128 --dataset folder --data_dir /path/to/images
121
+ ```
122
+
123
+ ### Python API
124
+
125
+ ```python
126
+ from liquidflow import liquidflow_small, euler_sample, make_grid_image
127
+ import torch
128
+
129
+ model = liquidflow_small(img_size=128) # 13.7M params
130
+ # ... after training ...
131
+ model.eval()
132
+ images = euler_sample(model, (16, 3, 128, 128), num_steps=50, device='cuda')
133
+ grid = make_grid_image(images.clamp(-1,1)*0.5+0.5, nrow=4)
134
+ grid.save('generated.png')
135
+ ```
136
+
137
+ ## πŸ“¦ File Structure
138
+
139
+ ```
140
+ β”œβ”€β”€ liquidflow/
141
+ β”‚ β”œβ”€β”€ __init__.py # Package exports
142
+ β”‚ β”œβ”€β”€ model.py # Core architecture (LiquidFlowNet, LiquidCfCCell, SelectiveSSM)
143
+ β”‚ β”œβ”€β”€ losses.py # Physics-informed flow matching loss + EMA
144
+ β”‚ β”œβ”€β”€ sampling.py # Euler & Heun ODE samplers
145
+ β”‚ └── train.py # Full training script with CLI
146
+ β”œβ”€β”€ LiquidFlow_Training.ipynb # πŸ““ Colab/Kaggle notebook
147
+ β”œβ”€β”€ smoke_test.py # Comprehensive CPU test suite (25 tests)
148
+ └── README.md
149
+ ```
150
+
151
+ ## πŸ”¬ Physics-Informed Loss
152
+
153
+ ```
154
+ L = L_flow + Ξ»_smooth Β· L_smooth + Ξ»_tv Β· L_tv
155
+ ```
156
+
157
+ | Term | Formula | Purpose |
158
+ |------|---------|---------|
159
+ | `L_flow` | `β€–v_ΞΈ(xβ‚œ,t) - (x₁-xβ‚€)β€–Β²` | Learn straight-line velocity field |
160
+ | `L_smooth` | `β€–βˆ‡Β²x_predβ€–Β²` (Laplacian) | Penalize high-frequency noise |
161
+ | `L_tv` | `β€–βˆ‡x_pred‖₁` (Total Variation) | Edge-preserving smoothness |
162
+
163
+ Physics loss is **warmed up** over the first 500 steps.
164
+
165
+ ## πŸ§ͺ Recommended Experiments
166
+
167
+ | Goal | Dataset | Model | Size | Epochs | Time (T4) |
168
+ |------|---------|-------|------|--------|-----------|
169
+ | Sanity check | CIFAR-10 | tiny | 32 | 20 | ~5 min |
170
+ | Baseline | CIFAR-10 | tiny | 128 | 100 | ~2 hrs |
171
+ | Quality | Flowers-102 | small | 128 | 200 | ~4 hrs |
172
+ | Faces | CelebA | small | 128 | 50 | ~6 hrs |
173
+ | High-res | CelebA | 512 | 512 | 100 | ~12 hrs |
174
+
175
+ ## πŸ“± Mobile Export
176
+
177
+ The notebook includes TorchScript and ONNX export cells. The `tiny` model produces a ~24MB file for on-device inference.
178
+
179
+ ## βœ… Verified (25/25 smoke tests pass)
180
+
181
+ - All 4 model variants: forward pass βœ“
182
+ - Backward pass: all parameters receive gradients βœ“
183
+ - Gradient health: no NaN, no Inf βœ“
184
+ - Loss convergence: finite across optimizer steps βœ“
185
+ - Individual components: LiquidCfCCell, SelectiveSSM, LiquidSSMBlock βœ“
186
+ - Scan patterns: 4 patterns, all invertible βœ“
187
+ - Sampling: Euler + Heun produce finite images βœ“
188
+ - EMA: apply/restore cycle βœ“
189
+ - Checkpoint: save/load round-trip βœ“
190
+ - Physics loss: all terms finite and positive βœ“
191
+
192
+ ## πŸ“š References
193
+
194
+ 1. Hasani et al., "Liquid Time-Constant Networks", AAAI 2021 ([2006.04439](https://arxiv.org/abs/2006.04439))
195
+ 2. Hasani et al., "Closed-form Continuous-depth Models", Nature MI 2022
196
+ 3. Gu & Dao, "Mamba: Linear-Time Sequence Modeling", 2023 ([2312.00752](https://arxiv.org/abs/2312.00752))
197
+ 4. Teng et al., "DiM: Diffusion Mamba", 2024 ([2405.14224](https://arxiv.org/abs/2405.14224))
198
+ 5. Hu et al., "ZigMa: Zigzag Mamba Diffusion", 2024 ([2403.13802](https://arxiv.org/abs/2403.13802))
199
+ 6. Lipman et al., "Flow Matching for Generative Modeling", ICLR 2023
200
+ 7. Raissi et al., "Physics-Informed Neural Networks", JCP 2019 ([1711.10561](https://arxiv.org/abs/1711.10561))
201
+ 8. Wang et al., "Gradient Pathologies in PINNs", 2020 ([2001.04536](https://arxiv.org/abs/2001.04536))
202
+ 9. Bastek & Kochmann, "Physics-Informed Diffusion Models", 2024 ([2403.14404](https://arxiv.org/abs/2403.14404))
203
+ 10. Zhu et al., "Vision Mamba", 2024 ([2401.09417](https://arxiv.org/abs/2401.09417))
204
+
205
+ ## License
206
+ MIT