tefoteknik
/

agiformer

Model card Files Files and versions

xet

Community

tefoteknik commited on Nov 23, 2025

Commit

a0d90ec

verified ·

1 Parent(s): a09d793

Phase 7: Curriculum Learning (20K steps, BPC 1.78)

Browse files

Files changed (1) hide show

README.md +103 -33

README.md CHANGED Viewed

@@ -1,75 +1,143 @@
 ```
-### Training
 ```bash
-python train.py
 ```
 ### Inference
 ```bash
-python generate.py
 ```
-### System 2 Diagnostics
 ```bash
-python inspect_reasoning.py
 ```
 ## Architecture
 ```
-Bytes → Encoder (RoPE) → Linear Attention → Reasoning Loop → Local RNN → Bytes
-         (Patches)        (Global Context)    (3 steps)     (Autoregressive)
 ```
-### Components
 - **ByteLatentEncoder:** Patches bytes into latent vectors with RoPE
-- **LinearAttention:** $O(N)$ causal attention with ELU feature maps
 - **RecurrentReasoningBlock:** 3-step iterative thinking loop (System 2)
 - **LocalAutoregressiveHead:** GRU-based byte decoder
-See [docs/architecture.md](docs/architecture.md) for details.
 ## Features
 ✅ **No Tokenization** - Universal byte-level processing
-✅ **Linear Complexity** - Scales to long contexts
 ✅ **Active Reasoning** - Verified thinking loop (Δz = 12.7)
-✅ **Stable Training** - No NaN, robust gradient flow
-✅ **Temperature Sampling** - Diverse inference outputs
 ## Documentation
 - [Architecture Guide](docs/architecture.md) - Technical deep dive
-- [Training Guide](docs/training.md) - How to train from scratch
 - [Inference Guide](docs/inference.md) - Generation and sampling
 - [API Reference](docs/api.md) - Code documentation
-## Results
-### Quantitative
-- **BPC:** 2.2578 (enwik8, 5000 steps)
-- **Training Time:** 15 minutes (T4 GPU)
-- **Stability:** 0 NaN occurrences
-### Qualitative
-```
-Prompt: "The history of "
-Output: "Tomadination of the [[New Gouple de aparty]] with the June
-         competition became at the..."
-```
-- Wikipedia syntax learned (`[[...]]`)
-- Clause structure emerging
-- "Thinking pause" (whitespace before output)
 ## Citation
 ```bibtex
 @software{agiformer2025,
-  title={AGIFORMER: Byte-Level Language Model with System 2 Reasoning},
   author={inkbytefo},
   year={2025},
   url={https://github.com/inkbytefo/agi-former}
 }
 ```
@@ -81,11 +149,13 @@ MIT License - see [LICENSE](LICENSE) file for details.
 ## Acknowledgments
 - Built with PyTorch
-- Trained on enwik8 dataset
-- Inspired by Linear Transformers and System 2 reasoning research
 ---
 **Developer:** inkbytefo
-**Contact:** [GitHub](https://github.com/inkbytefo)
-**Status:** Proof of Concept - Complete ✅

+# AGIFORMER: Byte-Level Language Model with Neuroplasticity
+> **Status:** Phase 7 - Curriculum Learning ✅ **Complete**
+> **Latest Achievement:** 20K curriculum training with 77% BPC reduction
+A research implementation of a byte-level language model featuring:
+- 🧠 **Hebbian Memory** with dynamic neuroplasticity
+- 📚 **Curriculum Learning** (3-stage developmental approach)
+- 🔄 **System 2 Reasoning** (iterative thinking loop)
+- 🚀 **Linear Complexity** attention mechanism
+## Quick Start
+### Installation
+```bash
+pip install torch datasets tqdm
 ```
+### Training (Curriculum Learning)
 ```bash
+python train_curriculum.py  # 20K steps, 3 curriculum stages
 ```
 ### Inference
 ```bash
+python generate.py best_model_curriculum.pth
 ```
+### Testing
 ```bash
+python test_recall.py best_model_curriculum.pth  # Memory test
+python inspect_reasoning.py                        # System 2 diagnostics
 ```
 ## Architecture
 ```
+Bytes → Encoder (RoPE) → Hebbian Memory → Reasoning Loop → Local RNN → Bytes
+         (Patches)        (Dynamic λ)       (3 steps)        (Autoregressive)
 ```
+### Core Components
 - **ByteLatentEncoder:** Patches bytes into latent vectors with RoPE
+- **HebbianMemory:** Fast weights with learnable decay + neuroplasticity (α)
 - **RecurrentReasoningBlock:** 3-step iterative thinking loop (System 2)
 - **LocalAutoregressiveHead:** GRU-based byte decoder
+See [docs/architecture.md](docs/architecture.md) for technical details.
 ## Features
 ✅ **No Tokenization** - Universal byte-level processing
+✅ **Linear Complexity** - O(N) attention with Hebbian memory
+✅ **Neuroplasticity** - Dynamic memory consolidation (α: 0.1 → 0.99)
+✅ **Curriculum Learning** - 3-stage developmental training
 ✅ **Active Reasoning** - Verified thinking loop (Δz = 12.7)
+✅ **AMP Compatible** - Mixed precision training with stability fixes
+## Curriculum Learning (Phase 7)
+### Training Stages
+| Stage | Steps | Plasticity (α) | Data | Purpose |
+|-------|-------|----------------|------|---------|
+| **1. Childhood** | 0-3K | 0.10 | Dictionary | Lexical grounding |
+| **2. Youth** | 3K-8K | 0.50 | Stories | Syntactic scaffolding |
+| **3. Adulthood** | 8K-20K | 0.99 | Wikipedia | Semantic expansion |
+### Results (20K Steps - Turkish Training)
+**Metrics:**
+- **Final BPC:** 1.85 (↓77% from initialization)
+- **Best Val BPC:** 1.78
+- **Training Time:** ~50 minutes (CUDA GPU)
+- **Stability:** 0 NaN occurrences across 20K steps
+**Progress:**
+```
+Step 0:     BPC = 8.04  (Random initialization)
+Step 5K:    BPC = 2.23  (Initial curriculum complete)
+Step 10K:   BPC = 1.98  (Mid-training)
+Step 20K:   BPC = 1.85  (Final)
+```
+**Improvement:** **6.19 BPC reduction** (77% improvement)
+## Critical Fix: AMP Stability
+**Problem:** Float16 overflow in Hebbian Memory with low plasticity (α=0.1)
+**Solution:** Force float32 computation for memory module
+```python
+@torch.amp.autocast('cuda', enabled=False)
+def forward(self, x):
+    x = x.float()  # Bypass AMP for numerical stability
+    # ... Hebbian computation ...
+    return out.to(input_dtype)
+```
+This fix enables stable 20K+ step training with AMP enabled.
 ## Documentation
 - [Architecture Guide](docs/architecture.md) - Technical deep dive
+- [Training Guide](docs/training.md) - Training from scratch
 - [Inference Guide](docs/inference.md) - Generation and sampling
 - [API Reference](docs/api.md) - Code documentation
+- [RFC 007: Curriculum Learning](docs/RFC_007_Curriculum_Learning.md) - Phase 7 design
+## Model Files
+- `best_model_curriculum.pth` - Best checkpoint (Val BPC: 1.78)
+- `last_model_curriculum.pth` - Final model state (20K steps)
+- `metrics_curriculum.json` - Full training metrics
+## Next Steps
+### Recommended Improvements
+1. **Extended Training:** 30K-50K steps for further convergence
+2. **Larger Model:** Increase d_model=768, n_layers=8
+3. **Longer Context:** Extend to 2048 token window
+4. **Fine-tuning:** Domain-specific Turkish datasets
+### Research Directions
+- Adaptive plasticity scheduling
+- Multi-stage curriculum optimization
+- Cross-lingual transfer learning
+- Sparse Hebbian memory
 ## Citation
 ```bibtex
 @software{agiformer2025,
+  title={AGIFORMER: Byte-Level Language Model with Hebbian Memory and Neuroplasticity},
   author={inkbytefo},
   year={2025},
+  note={Phase 7: Curriculum Learning with Dynamic Plasticity},
   url={https://github.com/inkbytefo/agi-former}
 }
 ```
 ## Acknowledgments
 - Built with PyTorch
+- Turkish Wikipedia dataset (trwiki)
+- Turkish Dictionary dataset (TDK)
+- Inspired by Fast Weights, Linear Transformers, and developmental neuroscience
 ---
 **Developer:** inkbytefo
+**Phase:** 7 (Curriculum Learning & Neuroplasticity)
+**Status:** Production Ready ✅
+**Last Updated:** 2025-11-23