krystv commited on
Commit
aae615d
Β·
verified Β·
1 Parent(s): 2548d10

Update README with complete training notebook and stability mechanisms

Browse files
Files changed (1) hide show
  1. README.md +62 -12
README.md CHANGED
@@ -2,8 +2,8 @@
2
 
3
  ## A Novel Architecture for Intelligent, Lightweight Illustration Generation
4
 
5
- **Version:** 1.0
6
- **Status:** Architecture Specification + Prototype Implementation
7
  **Target:** 2-4GB RAM, 1024px native generation, anime/illustration focus
8
 
9
  ### πŸ”¬ Validated Prototype Results
@@ -15,17 +15,67 @@
15
  πŸ”€ Zigzag scan: perfect round-trip
16
  βœ… Forward pass: correct shapes
17
  βœ… Backward pass: no NaN/Inf gradients
 
18
  ```
19
 
20
- See `ARCHITECTURE.md` for the complete 1000+ line technical specification, and `artflow_model.py` for the validated PyTorch implementation.
21
 
22
- ### Key Novel Contributions
23
- 1. **WaveMamba**: Wavelet-decomposed Mamba denoising backbone (O(n) complexity)
24
- 2. **Recursive Latent Reasoning**: TRM/HRM-style reasoning within denoising steps
25
- 3. **ArtStyle Matrix**: Explicit, manipulable style space for illustration generation
26
- 4. **Liquid-dynamics Mood Control**: Physics-inspired mood modulation
27
- 5. **Art-Aware Velocity Scaling**: Frequency-weighted flow matching loss
28
- 6. **KAN-based Composition**: Kolmogorov-Arnold Networks for compositional rules
29
 
30
- ### Research Foundation
31
- Synthesized from 40+ papers including MobileDiffusion, SnapGen, DreamLite, ZigMa, DiMSUM, DC-AE, TRM/HRM, Liquid Neural Networks, RWKV, KAN, Illustrious, and more.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
 
3
  ## A Novel Architecture for Intelligent, Lightweight Illustration Generation
4
 
5
+ **Version:** 1.1
6
+ **Status:** Architecture Specification + Prototype Implementation + Training Notebook
7
  **Target:** 2-4GB RAM, 1024px native generation, anime/illustration focus
8
 
9
  ### πŸ”¬ Validated Prototype Results
 
15
  πŸ”€ Zigzag scan: perfect round-trip
16
  βœ… Forward pass: correct shapes
17
  βœ… Backward pass: no NaN/Inf gradients
18
+ βœ… 100-step training: stable (no oscillation, no explosion)
19
  ```
20
 
21
+ ### πŸ“ Repository Contents
22
 
23
+ | File | Description |
24
+ |------|-------------|
25
+ | `ARCHITECTURE.md` | 1038-line complete technical specification (15 sections) |
26
+ | `artflow_model.py` | 1149-line validated PyTorch prototype β€” all modules |
27
+ | `ArtFlow_Training.ipynb` | **Complete Colab/Kaggle training notebook** with all 5 stages |
28
+ | `train_stage1.py` | Standalone Stage 1 training script |
29
+ | `test_training.py` | 100-step training stability validation |
30
 
31
+ ### πŸ§ͺ Training Stability (Research-Backed)
32
+
33
+ Every training decision is backed by published research to prevent failure:
34
+
35
+ | Problem | Prevention Mechanism | Paper |
36
+ |---------|---------------------|-------|
37
+ | Loss explodes | Grad clip (1.0) + LR warmup + zero-init output | DiT |
38
+ | Loss oscillates | Cosine annealing + Min-SNR-Ξ³ weighting | [arXiv:2303.09556] |
39
+ | Bad batch spikes | Pseudo-Huber loss + auto spike detection | [arXiv:2403.16728] |
40
+ | Attention NaN | QK-RMSNorm prevents softmax saturation | SnapGen |
41
+ | Modules interfere | Staged freeze/unfreeze per module | DreamLite |
42
+ | Gradient vanishing | Residual connections + AdaLN everywhere | Standard |
43
+ | Memory OOM | Gradient checkpointing + AMP + small micro-batch | Standard |
44
+ | Training stalls | Gradient variance monitoring β†’ auto LR reduction | Novel |
45
+ | Style collapse | Trained separately before joint fine-tuning | USO |
46
+ | High-freq artifacts | Art-aware frequency-weighted loss | Novel |
47
+
48
+ ### πŸ—οΈ 7 Novel Contributions
49
+
50
+ 1. **WaveMamba** β€” Wavelet-decomposed Mamba denoising backbone (O(n) complexity)
51
+ 2. **Recursive Latent Reasoning (RLR)** β€” TRM/HRM-style reasoning within denoising steps
52
+ 3. **ArtStyle Matrix** β€” Explicit, manipulable style vectors for illustration generation
53
+ 4. **Liquid-Dynamics Mood Control** β€” Physics-inspired mood modulation via adaptive time constants
54
+ 5. **Art-Aware Velocity Scaling** β€” Frequency-weighted flow matching loss for artistic quality
55
+ 6. **Deep Improvement Supervision** β€” Train reasoning recursions with progressively cleaner targets
56
+ 7. **KAN Composition Engine** β€” Kolmogorov-Arnold Networks for smooth compositional rules
57
+
58
+ ### πŸ“š 5-Stage Training Pipeline
59
+
60
+ ```
61
+ Stage 1: Base Generation (50K steps) β†’ Backbone learns denoising
62
+ Stage 2: Style Matrix (25K steps) β†’ Disentangled art style learning
63
+ Stage 3: Resolution (25K steps) β†’ Scale to 1024px + enable reasoning
64
+ Stage 4: Concept & Mood (15K steps) β†’ Scene understanding + emotion
65
+ Stage 5: Quality SFT (5K steps) β†’ Human preference alignment
66
+ ```
67
+
68
+ All stages designed for **Colab T4 / Kaggle P100** (single free GPU):
69
+ - Batch size 2 Γ— gradient accumulation 32 = effective batch 64
70
+ - Mixed precision (bf16/fp16)
71
+ - Gradient checkpointing
72
+ - Progressive resolution
73
+
74
+ ### πŸ“– Research Foundation
75
+
76
+ Synthesized from 40+ papers including:
77
+ MobileDiffusion, SnapGen, DreamLite, ZigMa, DiMSUM, DC-AE, TRM/HRM,
78
+ Liquid Neural Networks, RWKV, KAN, Illustrious, Rectified Flow++,
79
+ Stable Velocity, USO, Vision Mamba, Min-SNR, Pseudo-Huber Loss, and more.
80
+
81
+ See `ARCHITECTURE.md` for the complete research synthesis with citations.