# AGIFORMER: Byte-Level Language Model with Neuroplasticity > **Status:** Phase 7 - Curriculum Learning ✅ **Complete** > **Latest Achievement:** 20K curriculum training with 77% BPC reduction A research implementation of a byte-level language model featuring: - 🧠 **Hebbian Memory** with dynamic neuroplasticity - 📚 **Curriculum Learning** (3-stage developmental approach) - 🔄 **System 2 Reasoning** (iterative thinking loop) - 🚀 **Linear Complexity** attention mechanism ## Quick Start ### Installation ```bash pip install torch datasets tqdm ``` ### Training (Curriculum Learning) ```bash python train_curriculum.py # 20K steps, 3 curriculum stages ``` ### Inference ```bash python generate.py best_model_curriculum.pth ``` ### Testing ```bash python test_recall.py best_model_curriculum.pth # Memory test python inspect_reasoning.py # System 2 diagnostics ``` ## Architecture ``` Bytes → Encoder (RoPE) → Hebbian Memory → Reasoning Loop → Local RNN → Bytes (Patches) (Dynamic λ) (3 steps) (Autoregressive) ``` ### Core Components - **ByteLatentEncoder:** Patches bytes into latent vectors with RoPE - **HebbianMemory:** Fast weights with learnable decay + neuroplasticity (α) - **RecurrentReasoningBlock:** 3-step iterative thinking loop (System 2) - **LocalAutoregressiveHead:** GRU-based byte decoder See [docs/architecture.md](docs/architecture.md) for technical details. ## Features ✅ **No Tokenization** - Universal byte-level processing ✅ **Linear Complexity** - O(N) attention with Hebbian memory ✅ **Neuroplasticity** - Dynamic memory consolidation (α: 0.1 → 0.99) ✅ **Curriculum Learning** - 3-stage developmental training ✅ **Active Reasoning** - Verified thinking loop (Δz = 12.7) ✅ **AMP Compatible** - Mixed precision training with stability fixes ## Curriculum Learning (Phase 7) ### Training Stages | Stage | Steps | Plasticity (α) | Data | Purpose | |-------|-------|----------------|------|---------| | **1. Childhood** | 0-3K | 0.10 | Dictionary | Lexical grounding | | **2. Youth** | 3K-8K | 0.50 | Stories | Syntactic scaffolding | | **3. Adulthood** | 8K-20K | 0.99 | Wikipedia | Semantic expansion | ### Results (20K Steps - Turkish Training) **Metrics:** - **Final BPC:** 1.85 (↓77% from initialization) - **Best Val BPC:** 1.78 - **Training Time:** ~50 minutes (CUDA GPU) - **Stability:** 0 NaN occurrences across 20K steps **Progress:** ``` Step 0: BPC = 8.04 (Random initialization) Step 5K: BPC = 2.23 (Initial curriculum complete) Step 10K: BPC = 1.98 (Mid-training) Step 20K: BPC = 1.85 (Final) ``` **Improvement:** **6.19 BPC reduction** (77% improvement) ## Critical Fix: AMP Stability **Problem:** Float16 overflow in Hebbian Memory with low plasticity (α=0.1) **Solution:** Force float32 computation for memory module ```python @torch.amp.autocast('cuda', enabled=False) def forward(self, x): x = x.float() # Bypass AMP for numerical stability # ... Hebbian computation ... return out.to(input_dtype) ``` This fix enables stable 20K+ step training with AMP enabled. ## Documentation - [Architecture Guide](docs/architecture.md) - Technical deep dive - [Training Guide](docs/training.md) - Training from scratch - [Inference Guide](docs/inference.md) - Generation and sampling - [API Reference](docs/api.md) - Code documentation - [RFC 007: Curriculum Learning](docs/RFC_007_Curriculum_Learning.md) - Phase 7 design ## Model Files - `best_model_curriculum.pth` - Best checkpoint (Val BPC: 1.78) - `last_model_curriculum.pth` - Final model state (20K steps) - `metrics_curriculum.json` - Full training metrics ## Next Steps ### Recommended Improvements 1. **Extended Training:** 30K-50K steps for further convergence 2. **Larger Model:** Increase d_model=768, n_layers=8 3. **Longer Context:** Extend to 2048 token window 4. **Fine-tuning:** Domain-specific Turkish datasets ### Research Directions - Adaptive plasticity scheduling - Multi-stage curriculum optimization - Cross-lingual transfer learning - Sparse Hebbian memory ## Citation ```bibtex @software{agiformer2025, title={AGIFORMER: Byte-Level Language Model with Hebbian Memory and Neuroplasticity}, author={inkbytefo}, year={2025}, note={Phase 7: Curriculum Learning with Dynamic Plasticity}, url={https://github.com/inkbytefo/agi-former} } ``` ## License MIT License - see [LICENSE](LICENSE) file for details. ## Acknowledgments - Built with PyTorch - Turkish Wikipedia dataset (trwiki) - Turkish Dictionary dataset (TDK) - Inspired by Fast Weights, Linear Transformers, and developmental neuroscience --- **Developer:** inkbytefo **Phase:** 7 (Curriculum Learning & Neuroplasticity) **Status:** Production Ready ✅ **Last Updated:** 2025-11-23