| # AGIFORMER: Byte-Level Language Model with Neuroplasticity | |
| > **Status:** Phase 7 - Curriculum Learning ✅ **Complete** | |
| > **Latest Achievement:** 20K curriculum training with 77% BPC reduction | |
| A research implementation of a byte-level language model featuring: | |
| - 🧠 **Hebbian Memory** with dynamic neuroplasticity | |
| - 📚 **Curriculum Learning** (3-stage developmental approach) | |
| - 🔄 **System 2 Reasoning** (iterative thinking loop) | |
| - 🚀 **Linear Complexity** attention mechanism | |
| ## Quick Start | |
| ### Installation | |
| ```bash | |
| pip install torch datasets tqdm | |
| ``` | |
| ### Training (Curriculum Learning) | |
| ```bash | |
| python train_curriculum.py # 20K steps, 3 curriculum stages | |
| ``` | |
| ### Inference | |
| ```bash | |
| python generate.py best_model_curriculum.pth | |
| ``` | |
| ### Testing | |
| ```bash | |
| python test_recall.py best_model_curriculum.pth # Memory test | |
| python inspect_reasoning.py # System 2 diagnostics | |
| ``` | |
| ## Architecture | |
| ``` | |
| Bytes → Encoder (RoPE) → Hebbian Memory → Reasoning Loop → Local RNN → Bytes | |
| (Patches) (Dynamic λ) (3 steps) (Autoregressive) | |
| ``` | |
| ### Core Components | |
| - **ByteLatentEncoder:** Patches bytes into latent vectors with RoPE | |
| - **HebbianMemory:** Fast weights with learnable decay + neuroplasticity (α) | |
| - **RecurrentReasoningBlock:** 3-step iterative thinking loop (System 2) | |
| - **LocalAutoregressiveHead:** GRU-based byte decoder | |
| See [docs/architecture.md](docs/architecture.md) for technical details. | |
| ## Features | |
| ✅ **No Tokenization** - Universal byte-level processing | |
| ✅ **Linear Complexity** - O(N) attention with Hebbian memory | |
| ✅ **Neuroplasticity** - Dynamic memory consolidation (α: 0.1 → 0.99) | |
| ✅ **Curriculum Learning** - 3-stage developmental training | |
| ✅ **Active Reasoning** - Verified thinking loop (Δz = 12.7) | |
| ✅ **AMP Compatible** - Mixed precision training with stability fixes | |
| ## Curriculum Learning (Phase 7) | |
| ### Training Stages | |
| | Stage | Steps | Plasticity (α) | Data | Purpose | | |
| |-------|-------|----------------|------|---------| | |
| | **1. Childhood** | 0-3K | 0.10 | Dictionary | Lexical grounding | | |
| | **2. Youth** | 3K-8K | 0.50 | Stories | Syntactic scaffolding | | |
| | **3. Adulthood** | 8K-20K | 0.99 | Wikipedia | Semantic expansion | | |
| ### Results (20K Steps - Turkish Training) | |
| **Metrics:** | |
| - **Final BPC:** 1.85 (↓77% from initialization) | |
| - **Best Val BPC:** 1.78 | |
| - **Training Time:** ~50 minutes (CUDA GPU) | |
| - **Stability:** 0 NaN occurrences across 20K steps | |
| **Progress:** | |
| ``` | |
| Step 0: BPC = 8.04 (Random initialization) | |
| Step 5K: BPC = 2.23 (Initial curriculum complete) | |
| Step 10K: BPC = 1.98 (Mid-training) | |
| Step 20K: BPC = 1.85 (Final) | |
| ``` | |
| **Improvement:** **6.19 BPC reduction** (77% improvement) | |
| ## Critical Fix: AMP Stability | |
| **Problem:** Float16 overflow in Hebbian Memory with low plasticity (α=0.1) | |
| **Solution:** Force float32 computation for memory module | |
| ```python | |
| @torch.amp.autocast('cuda', enabled=False) | |
| def forward(self, x): | |
| x = x.float() # Bypass AMP for numerical stability | |
| # ... Hebbian computation ... | |
| return out.to(input_dtype) | |
| ``` | |
| This fix enables stable 20K+ step training with AMP enabled. | |
| ## Documentation | |
| - [Architecture Guide](docs/architecture.md) - Technical deep dive | |
| - [Training Guide](docs/training.md) - Training from scratch | |
| - [Inference Guide](docs/inference.md) - Generation and sampling | |
| - [API Reference](docs/api.md) - Code documentation | |
| - [RFC 007: Curriculum Learning](docs/RFC_007_Curriculum_Learning.md) - Phase 7 design | |
| ## Model Files | |
| - `best_model_curriculum.pth` - Best checkpoint (Val BPC: 1.78) | |
| - `last_model_curriculum.pth` - Final model state (20K steps) | |
| - `metrics_curriculum.json` - Full training metrics | |
| ## Next Steps | |
| ### Recommended Improvements | |
| 1. **Extended Training:** 30K-50K steps for further convergence | |
| 2. **Larger Model:** Increase d_model=768, n_layers=8 | |
| 3. **Longer Context:** Extend to 2048 token window | |
| 4. **Fine-tuning:** Domain-specific Turkish datasets | |
| ### Research Directions | |
| - Adaptive plasticity scheduling | |
| - Multi-stage curriculum optimization | |
| - Cross-lingual transfer learning | |
| - Sparse Hebbian memory | |
| ## Citation | |
| ```bibtex | |
| @software{agiformer2025, | |
| title={AGIFORMER: Byte-Level Language Model with Hebbian Memory and Neuroplasticity}, | |
| author={inkbytefo}, | |
| year={2025}, | |
| note={Phase 7: Curriculum Learning with Dynamic Plasticity}, | |
| url={https://github.com/inkbytefo/agi-former} | |
| } | |
| ``` | |
| ## License | |
| MIT License - see [LICENSE](LICENSE) file for details. | |
| ## Acknowledgments | |
| - Built with PyTorch | |
| - Turkish Wikipedia dataset (trwiki) | |
| - Turkish Dictionary dataset (TDK) | |
| - Inspired by Fast Weights, Linear Transformers, and developmental neuroscience | |
| --- | |
| **Developer:** inkbytefo | |
| **Phase:** 7 (Curriculum Learning & Neuroplasticity) | |
| **Status:** Production Ready ✅ | |
| **Last Updated:** 2025-11-23 | |