tefoteknik commited on
Commit
a0d90ec
·
verified ·
1 Parent(s): a09d793

Phase 7: Curriculum Learning (20K steps, BPC 1.78)

Browse files
Files changed (1) hide show
  1. README.md +103 -33
README.md CHANGED
@@ -1,75 +1,143 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ```
2
 
3
- ### Training
4
  ```bash
5
- python train.py
6
  ```
7
 
8
  ### Inference
9
  ```bash
10
- python generate.py
11
  ```
12
 
13
- ### System 2 Diagnostics
14
  ```bash
15
- python inspect_reasoning.py
 
16
  ```
17
 
18
  ## Architecture
19
 
20
  ```
21
- Bytes → Encoder (RoPE) → Linear Attention → Reasoning Loop → Local RNN → Bytes
22
- (Patches) (Global Context) (3 steps) (Autoregressive)
23
  ```
24
 
25
- ### Components
 
26
  - **ByteLatentEncoder:** Patches bytes into latent vectors with RoPE
27
- - **LinearAttention:** $O(N)$ causal attention with ELU feature maps
28
  - **RecurrentReasoningBlock:** 3-step iterative thinking loop (System 2)
29
  - **LocalAutoregressiveHead:** GRU-based byte decoder
30
 
31
- See [docs/architecture.md](docs/architecture.md) for details.
32
 
33
  ## Features
34
 
35
  ✅ **No Tokenization** - Universal byte-level processing
36
- ✅ **Linear Complexity** - Scales to long contexts
 
 
37
  ✅ **Active Reasoning** - Verified thinking loop (Δz = 12.7)
38
- ✅ **Stable Training** - No NaN, robust gradient flow
39
- ✅ **Temperature Sampling** - Diverse inference outputs
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
 
41
  ## Documentation
42
 
43
  - [Architecture Guide](docs/architecture.md) - Technical deep dive
44
- - [Training Guide](docs/training.md) - How to train from scratch
45
  - [Inference Guide](docs/inference.md) - Generation and sampling
46
  - [API Reference](docs/api.md) - Code documentation
 
47
 
48
- ## Results
49
 
50
- ### Quantitative
51
- - **BPC:** 2.2578 (enwik8, 5000 steps)
52
- - **Training Time:** 15 minutes (T4 GPU)
53
- - **Stability:** 0 NaN occurrences
54
 
55
- ### Qualitative
56
- ```
57
- Prompt: "The history of "
58
- Output: "Tomadination of the [[New Gouple de aparty]] with the June
59
- competition became at the..."
60
- ```
 
 
 
 
61
 
62
- - Wikipedia syntax learned (`[[...]]`)
63
- - Clause structure emerging
64
- - "Thinking pause" (whitespace before output)
 
65
 
66
  ## Citation
67
 
68
  ```bibtex
69
  @software{agiformer2025,
70
- title={AGIFORMER: Byte-Level Language Model with System 2 Reasoning},
71
  author={inkbytefo},
72
  year={2025},
 
73
  url={https://github.com/inkbytefo/agi-former}
74
  }
75
  ```
@@ -81,11 +149,13 @@ MIT License - see [LICENSE](LICENSE) file for details.
81
  ## Acknowledgments
82
 
83
  - Built with PyTorch
84
- - Trained on enwik8 dataset
85
- - Inspired by Linear Transformers and System 2 reasoning research
 
86
 
87
  ---
88
 
89
  **Developer:** inkbytefo
90
- **Contact:** [GitHub](https://github.com/inkbytefo)
91
- **Status:** Proof of Concept - Complete
 
 
1
+ # AGIFORMER: Byte-Level Language Model with Neuroplasticity
2
+
3
+ > **Status:** Phase 7 - Curriculum Learning ✅ **Complete**
4
+ > **Latest Achievement:** 20K curriculum training with 77% BPC reduction
5
+
6
+ A research implementation of a byte-level language model featuring:
7
+ - 🧠 **Hebbian Memory** with dynamic neuroplasticity
8
+ - 📚 **Curriculum Learning** (3-stage developmental approach)
9
+ - 🔄 **System 2 Reasoning** (iterative thinking loop)
10
+ - 🚀 **Linear Complexity** attention mechanism
11
+
12
+ ## Quick Start
13
+
14
+ ### Installation
15
+ ```bash
16
+ pip install torch datasets tqdm
17
  ```
18
 
19
+ ### Training (Curriculum Learning)
20
  ```bash
21
+ python train_curriculum.py # 20K steps, 3 curriculum stages
22
  ```
23
 
24
  ### Inference
25
  ```bash
26
+ python generate.py best_model_curriculum.pth
27
  ```
28
 
29
+ ### Testing
30
  ```bash
31
+ python test_recall.py best_model_curriculum.pth # Memory test
32
+ python inspect_reasoning.py # System 2 diagnostics
33
  ```
34
 
35
  ## Architecture
36
 
37
  ```
38
+ Bytes → Encoder (RoPE) → Hebbian Memory → Reasoning Loop → Local RNN → Bytes
39
+ (Patches) (Dynamic λ) (3 steps) (Autoregressive)
40
  ```
41
 
42
+ ### Core Components
43
+
44
  - **ByteLatentEncoder:** Patches bytes into latent vectors with RoPE
45
+ - **HebbianMemory:** Fast weights with learnable decay + neuroplasticity (α)
46
  - **RecurrentReasoningBlock:** 3-step iterative thinking loop (System 2)
47
  - **LocalAutoregressiveHead:** GRU-based byte decoder
48
 
49
+ See [docs/architecture.md](docs/architecture.md) for technical details.
50
 
51
  ## Features
52
 
53
  ✅ **No Tokenization** - Universal byte-level processing
54
+ ✅ **Linear Complexity** - O(N) attention with Hebbian memory
55
+ ✅ **Neuroplasticity** - Dynamic memory consolidation (α: 0.1 → 0.99)
56
+ ✅ **Curriculum Learning** - 3-stage developmental training
57
  ✅ **Active Reasoning** - Verified thinking loop (Δz = 12.7)
58
+ ✅ **AMP Compatible** - Mixed precision training with stability fixes
59
+
60
+ ## Curriculum Learning (Phase 7)
61
+
62
+ ### Training Stages
63
+
64
+ | Stage | Steps | Plasticity (α) | Data | Purpose |
65
+ |-------|-------|----------------|------|---------|
66
+ | **1. Childhood** | 0-3K | 0.10 | Dictionary | Lexical grounding |
67
+ | **2. Youth** | 3K-8K | 0.50 | Stories | Syntactic scaffolding |
68
+ | **3. Adulthood** | 8K-20K | 0.99 | Wikipedia | Semantic expansion |
69
+
70
+ ### Results (20K Steps - Turkish Training)
71
+
72
+ **Metrics:**
73
+ - **Final BPC:** 1.85 (↓77% from initialization)
74
+ - **Best Val BPC:** 1.78
75
+ - **Training Time:** ~50 minutes (CUDA GPU)
76
+ - **Stability:** 0 NaN occurrences across 20K steps
77
+
78
+ **Progress:**
79
+ ```
80
+ Step 0: BPC = 8.04 (Random initialization)
81
+ Step 5K: BPC = 2.23 (Initial curriculum complete)
82
+ Step 10K: BPC = 1.98 (Mid-training)
83
+ Step 20K: BPC = 1.85 (Final)
84
+ ```
85
+
86
+ **Improvement:** **6.19 BPC reduction** (77% improvement)
87
+
88
+ ## Critical Fix: AMP Stability
89
+
90
+ **Problem:** Float16 overflow in Hebbian Memory with low plasticity (α=0.1)
91
+ **Solution:** Force float32 computation for memory module
92
+
93
+ ```python
94
+ @torch.amp.autocast('cuda', enabled=False)
95
+ def forward(self, x):
96
+ x = x.float() # Bypass AMP for numerical stability
97
+ # ... Hebbian computation ...
98
+ return out.to(input_dtype)
99
+ ```
100
+
101
+ This fix enables stable 20K+ step training with AMP enabled.
102
 
103
  ## Documentation
104
 
105
  - [Architecture Guide](docs/architecture.md) - Technical deep dive
106
+ - [Training Guide](docs/training.md) - Training from scratch
107
  - [Inference Guide](docs/inference.md) - Generation and sampling
108
  - [API Reference](docs/api.md) - Code documentation
109
+ - [RFC 007: Curriculum Learning](docs/RFC_007_Curriculum_Learning.md) - Phase 7 design
110
 
111
+ ## Model Files
112
 
113
+ - `best_model_curriculum.pth` - Best checkpoint (Val BPC: 1.78)
114
+ - `last_model_curriculum.pth` - Final model state (20K steps)
115
+ - `metrics_curriculum.json` - Full training metrics
 
116
 
117
+ ## Next Steps
118
+
119
+ ### Recommended Improvements
120
+
121
+ 1. **Extended Training:** 30K-50K steps for further convergence
122
+ 2. **Larger Model:** Increase d_model=768, n_layers=8
123
+ 3. **Longer Context:** Extend to 2048 token window
124
+ 4. **Fine-tuning:** Domain-specific Turkish datasets
125
+
126
+ ### Research Directions
127
 
128
+ - Adaptive plasticity scheduling
129
+ - Multi-stage curriculum optimization
130
+ - Cross-lingual transfer learning
131
+ - Sparse Hebbian memory
132
 
133
  ## Citation
134
 
135
  ```bibtex
136
  @software{agiformer2025,
137
+ title={AGIFORMER: Byte-Level Language Model with Hebbian Memory and Neuroplasticity},
138
  author={inkbytefo},
139
  year={2025},
140
+ note={Phase 7: Curriculum Learning with Dynamic Plasticity},
141
  url={https://github.com/inkbytefo/agi-former}
142
  }
143
  ```
 
149
  ## Acknowledgments
150
 
151
  - Built with PyTorch
152
+ - Turkish Wikipedia dataset (trwiki)
153
+ - Turkish Dictionary dataset (TDK)
154
+ - Inspired by Fast Weights, Linear Transformers, and developmental neuroscience
155
 
156
  ---
157
 
158
  **Developer:** inkbytefo
159
+ **Phase:** 7 (Curriculum Learning & Neuroplasticity)
160
+ **Status:** Production Ready
161
+ **Last Updated:** 2025-11-23