Add ArtFlow architecture specification
Browse files
README.md
ADDED
|
@@ -0,0 +1,31 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# π¨ ArtFlow: Reasoning-Native Artistic Image Generation for Mobile Devices
|
| 2 |
+
|
| 3 |
+
## A Novel Architecture for Intelligent, Lightweight Illustration Generation
|
| 4 |
+
|
| 5 |
+
**Version:** 1.0
|
| 6 |
+
**Status:** Architecture Specification + Prototype Implementation
|
| 7 |
+
**Target:** 2-4GB RAM, 1024px native generation, anime/illustration focus
|
| 8 |
+
|
| 9 |
+
### π¬ Validated Prototype Results
|
| 10 |
+
```
|
| 11 |
+
π Parameter Count: 114.7M (backbone only, without text encoder/VAE)
|
| 12 |
+
πΎ Model Memory: 229 MB (FP16) / 115 MB (INT8)
|
| 13 |
+
π± Total inference: ~235 MB (well under 2GB mobile budget)
|
| 14 |
+
π Wavelet reconstruction: perfect (error < 1e-7)
|
| 15 |
+
π Zigzag scan: perfect round-trip
|
| 16 |
+
β
Forward pass: correct shapes
|
| 17 |
+
β
Backward pass: no NaN/Inf gradients
|
| 18 |
+
```
|
| 19 |
+
|
| 20 |
+
See `ARCHITECTURE.md` for the complete 1000+ line technical specification, and `artflow_model.py` for the validated PyTorch implementation.
|
| 21 |
+
|
| 22 |
+
### Key Novel Contributions
|
| 23 |
+
1. **WaveMamba**: Wavelet-decomposed Mamba denoising backbone (O(n) complexity)
|
| 24 |
+
2. **Recursive Latent Reasoning**: TRM/HRM-style reasoning within denoising steps
|
| 25 |
+
3. **ArtStyle Matrix**: Explicit, manipulable style space for illustration generation
|
| 26 |
+
4. **Liquid-dynamics Mood Control**: Physics-inspired mood modulation
|
| 27 |
+
5. **Art-Aware Velocity Scaling**: Frequency-weighted flow matching loss
|
| 28 |
+
6. **KAN-based Composition**: Kolmogorov-Arnold Networks for compositional rules
|
| 29 |
+
|
| 30 |
+
### Research Foundation
|
| 31 |
+
Synthesized from 40+ papers including MobileDiffusion, SnapGen, DreamLite, ZigMa, DiMSUM, DC-AE, TRM/HRM, Liquid Neural Networks, RWKV, KAN, Illustrious, and more.
|