krystv
/

ArtFlow

Model card Files Files and versions

xet

Community

krystv commited on Apr 28

Commit

aae615d

verified ·

1 Parent(s): 2548d10

Update README with complete training notebook and stability mechanisms

Browse files

Files changed (1) hide show

README.md +62 -12

README.md CHANGED Viewed

@@ -2,8 +2,8 @@
 ## A Novel Architecture for Intelligent, Lightweight Illustration Generation
-**Version:** 1.0
-**Status:** Architecture Specification + Prototype Implementation
 **Target:** 2-4GB RAM, 1024px native generation, anime/illustration focus
 ### 🔬 Validated Prototype Results
@@ -15,17 +15,67 @@
 🔀 Zigzag scan: perfect round-trip
 ✅ Forward pass: correct shapes
 ✅ Backward pass: no NaN/Inf gradients
 ```
-See `ARCHITECTURE.md` for the complete 1000+ line technical specification, and `artflow_model.py` for the validated PyTorch implementation.
-### Key Novel Contributions
-1. **WaveMamba**: Wavelet-decomposed Mamba denoising backbone (O(n) complexity)
-2. **Recursive Latent Reasoning**: TRM/HRM-style reasoning within denoising steps
-3. **ArtStyle Matrix**: Explicit, manipulable style space for illustration generation
-4. **Liquid-dynamics Mood Control**: Physics-inspired mood modulation
-5. **Art-Aware Velocity Scaling**: Frequency-weighted flow matching loss
-6. **KAN-based Composition**: Kolmogorov-Arnold Networks for compositional rules
-### Research Foundation
-Synthesized from 40+ papers including MobileDiffusion, SnapGen, DreamLite, ZigMa, DiMSUM, DC-AE, TRM/HRM, Liquid Neural Networks, RWKV, KAN, Illustrious, and more.

 ## A Novel Architecture for Intelligent, Lightweight Illustration Generation
+**Version:** 1.1
+**Status:** Architecture Specification + Prototype Implementation + Training Notebook
 **Target:** 2-4GB RAM, 1024px native generation, anime/illustration focus
 ### 🔬 Validated Prototype Results
 🔀 Zigzag scan: perfect round-trip
 ✅ Forward pass: correct shapes
 ✅ Backward pass: no NaN/Inf gradients
+✅ 100-step training: stable (no oscillation, no explosion)
 ```
+### 📁 Repository Contents
+| File | Description |
+|------|-------------|
+| `ARCHITECTURE.md` | 1038-line complete technical specification (15 sections) |
+| `artflow_model.py` | 1149-line validated PyTorch prototype — all modules |
+| `ArtFlow_Training.ipynb` | **Complete Colab/Kaggle training notebook** with all 5 stages |
+| `train_stage1.py` | Standalone Stage 1 training script |
+| `test_training.py` | 100-step training stability validation |
+### 🧪 Training Stability (Research-Backed)
+Every training decision is backed by published research to prevent failure:
+| Problem | Prevention Mechanism | Paper |
+|---------|---------------------|-------|
+| Loss explodes | Grad clip (1.0) + LR warmup + zero-init output | DiT |
+| Loss oscillates | Cosine annealing + Min-SNR-γ weighting | [arXiv:2303.09556] |
+| Bad batch spikes | Pseudo-Huber loss + auto spike detection | [arXiv:2403.16728] |
+| Attention NaN | QK-RMSNorm prevents softmax saturation | SnapGen |
+| Modules interfere | Staged freeze/unfreeze per module | DreamLite |
+| Gradient vanishing | Residual connections + AdaLN everywhere | Standard |
+| Memory OOM | Gradient checkpointing + AMP + small micro-batch | Standard |
+| Training stalls | Gradient variance monitoring → auto LR reduction | Novel |
+| Style collapse | Trained separately before joint fine-tuning | USO |
+| High-freq artifacts | Art-aware frequency-weighted loss | Novel |
+### 🏗️ 7 Novel Contributions
+1. **WaveMamba** — Wavelet-decomposed Mamba denoising backbone (O(n) complexity)
+2. **Recursive Latent Reasoning (RLR)** — TRM/HRM-style reasoning within denoising steps
+3. **ArtStyle Matrix** — Explicit, manipulable style vectors for illustration generation
+4. **Liquid-Dynamics Mood Control** — Physics-inspired mood modulation via adaptive time constants
+5. **Art-Aware Velocity Scaling** — Frequency-weighted flow matching loss for artistic quality
+6. **Deep Improvement Supervision** — Train reasoning recursions with progressively cleaner targets
+7. **KAN Composition Engine** — Kolmogorov-Arnold Networks for smooth compositional rules
+### 📚 5-Stage Training Pipeline
+```
+Stage 1: Base Generation (50K steps)  → Backbone learns denoising
+Stage 2: Style Matrix   (25K steps)  → Disentangled art style learning
+Stage 3: Resolution     (25K steps)  → Scale to 1024px + enable reasoning
+Stage 4: Concept & Mood (15K steps)  → Scene understanding + emotion
+Stage 5: Quality SFT    (5K steps)   → Human preference alignment
+```
+All stages designed for **Colab T4 / Kaggle P100** (single free GPU):
+- Batch size 2 × gradient accumulation 32 = effective batch 64
+- Mixed precision (bf16/fp16)
+- Gradient checkpointing
+- Progressive resolution
+### 📖 Research Foundation
+Synthesized from 40+ papers including:
+MobileDiffusion, SnapGen, DreamLite, ZigMa, DiMSUM, DC-AE, TRM/HRM,
+Liquid Neural Networks, RWKV, KAN, Illustrious, Rectified Flow++,
+Stable Velocity, USO, Vision Mamba, Min-SNR, Pseudo-Huber Loss, and more.
+See `ARCHITECTURE.md` for the complete research synthesis with citations.