Update README with complete training notebook and stability mechanisms
Browse files
README.md
CHANGED
|
@@ -2,8 +2,8 @@
|
|
| 2 |
|
| 3 |
## A Novel Architecture for Intelligent, Lightweight Illustration Generation
|
| 4 |
|
| 5 |
-
**Version:** 1.
|
| 6 |
-
**Status:** Architecture Specification + Prototype Implementation
|
| 7 |
**Target:** 2-4GB RAM, 1024px native generation, anime/illustration focus
|
| 8 |
|
| 9 |
### π¬ Validated Prototype Results
|
|
@@ -15,17 +15,67 @@
|
|
| 15 |
π Zigzag scan: perfect round-trip
|
| 16 |
β
Forward pass: correct shapes
|
| 17 |
β
Backward pass: no NaN/Inf gradients
|
|
|
|
| 18 |
```
|
| 19 |
|
| 20 |
-
|
| 21 |
|
| 22 |
-
|
| 23 |
-
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
|
| 30 |
-
###
|
| 31 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
## A Novel Architecture for Intelligent, Lightweight Illustration Generation
|
| 4 |
|
| 5 |
+
**Version:** 1.1
|
| 6 |
+
**Status:** Architecture Specification + Prototype Implementation + Training Notebook
|
| 7 |
**Target:** 2-4GB RAM, 1024px native generation, anime/illustration focus
|
| 8 |
|
| 9 |
### π¬ Validated Prototype Results
|
|
|
|
| 15 |
π Zigzag scan: perfect round-trip
|
| 16 |
β
Forward pass: correct shapes
|
| 17 |
β
Backward pass: no NaN/Inf gradients
|
| 18 |
+
β
100-step training: stable (no oscillation, no explosion)
|
| 19 |
```
|
| 20 |
|
| 21 |
+
### π Repository Contents
|
| 22 |
|
| 23 |
+
| File | Description |
|
| 24 |
+
|------|-------------|
|
| 25 |
+
| `ARCHITECTURE.md` | 1038-line complete technical specification (15 sections) |
|
| 26 |
+
| `artflow_model.py` | 1149-line validated PyTorch prototype β all modules |
|
| 27 |
+
| `ArtFlow_Training.ipynb` | **Complete Colab/Kaggle training notebook** with all 5 stages |
|
| 28 |
+
| `train_stage1.py` | Standalone Stage 1 training script |
|
| 29 |
+
| `test_training.py` | 100-step training stability validation |
|
| 30 |
|
| 31 |
+
### π§ͺ Training Stability (Research-Backed)
|
| 32 |
+
|
| 33 |
+
Every training decision is backed by published research to prevent failure:
|
| 34 |
+
|
| 35 |
+
| Problem | Prevention Mechanism | Paper |
|
| 36 |
+
|---------|---------------------|-------|
|
| 37 |
+
| Loss explodes | Grad clip (1.0) + LR warmup + zero-init output | DiT |
|
| 38 |
+
| Loss oscillates | Cosine annealing + Min-SNR-Ξ³ weighting | [arXiv:2303.09556] |
|
| 39 |
+
| Bad batch spikes | Pseudo-Huber loss + auto spike detection | [arXiv:2403.16728] |
|
| 40 |
+
| Attention NaN | QK-RMSNorm prevents softmax saturation | SnapGen |
|
| 41 |
+
| Modules interfere | Staged freeze/unfreeze per module | DreamLite |
|
| 42 |
+
| Gradient vanishing | Residual connections + AdaLN everywhere | Standard |
|
| 43 |
+
| Memory OOM | Gradient checkpointing + AMP + small micro-batch | Standard |
|
| 44 |
+
| Training stalls | Gradient variance monitoring β auto LR reduction | Novel |
|
| 45 |
+
| Style collapse | Trained separately before joint fine-tuning | USO |
|
| 46 |
+
| High-freq artifacts | Art-aware frequency-weighted loss | Novel |
|
| 47 |
+
|
| 48 |
+
### ποΈ 7 Novel Contributions
|
| 49 |
+
|
| 50 |
+
1. **WaveMamba** β Wavelet-decomposed Mamba denoising backbone (O(n) complexity)
|
| 51 |
+
2. **Recursive Latent Reasoning (RLR)** β TRM/HRM-style reasoning within denoising steps
|
| 52 |
+
3. **ArtStyle Matrix** β Explicit, manipulable style vectors for illustration generation
|
| 53 |
+
4. **Liquid-Dynamics Mood Control** β Physics-inspired mood modulation via adaptive time constants
|
| 54 |
+
5. **Art-Aware Velocity Scaling** β Frequency-weighted flow matching loss for artistic quality
|
| 55 |
+
6. **Deep Improvement Supervision** β Train reasoning recursions with progressively cleaner targets
|
| 56 |
+
7. **KAN Composition Engine** β Kolmogorov-Arnold Networks for smooth compositional rules
|
| 57 |
+
|
| 58 |
+
### π 5-Stage Training Pipeline
|
| 59 |
+
|
| 60 |
+
```
|
| 61 |
+
Stage 1: Base Generation (50K steps) β Backbone learns denoising
|
| 62 |
+
Stage 2: Style Matrix (25K steps) β Disentangled art style learning
|
| 63 |
+
Stage 3: Resolution (25K steps) β Scale to 1024px + enable reasoning
|
| 64 |
+
Stage 4: Concept & Mood (15K steps) β Scene understanding + emotion
|
| 65 |
+
Stage 5: Quality SFT (5K steps) β Human preference alignment
|
| 66 |
+
```
|
| 67 |
+
|
| 68 |
+
All stages designed for **Colab T4 / Kaggle P100** (single free GPU):
|
| 69 |
+
- Batch size 2 Γ gradient accumulation 32 = effective batch 64
|
| 70 |
+
- Mixed precision (bf16/fp16)
|
| 71 |
+
- Gradient checkpointing
|
| 72 |
+
- Progressive resolution
|
| 73 |
+
|
| 74 |
+
### π Research Foundation
|
| 75 |
+
|
| 76 |
+
Synthesized from 40+ papers including:
|
| 77 |
+
MobileDiffusion, SnapGen, DreamLite, ZigMa, DiMSUM, DC-AE, TRM/HRM,
|
| 78 |
+
Liquid Neural Networks, RWKV, KAN, Illustrious, Rectified Flow++,
|
| 79 |
+
Stable Velocity, USO, Vision Mamba, Min-SNR, Pseudo-Huber Loss, and more.
|
| 80 |
+
|
| 81 |
+
See `ARCHITECTURE.md` for the complete research synthesis with citations.
|