agiformer / README.md
tefoteknik's picture
Phase 7: Curriculum Learning (20K steps, BPC 1.78)
a0d90ec verified
# AGIFORMER: Byte-Level Language Model with Neuroplasticity
> **Status:** Phase 7 - Curriculum Learning ✅ **Complete**
> **Latest Achievement:** 20K curriculum training with 77% BPC reduction
A research implementation of a byte-level language model featuring:
- 🧠 **Hebbian Memory** with dynamic neuroplasticity
- 📚 **Curriculum Learning** (3-stage developmental approach)
- 🔄 **System 2 Reasoning** (iterative thinking loop)
- 🚀 **Linear Complexity** attention mechanism
## Quick Start
### Installation
```bash
pip install torch datasets tqdm
```
### Training (Curriculum Learning)
```bash
python train_curriculum.py # 20K steps, 3 curriculum stages
```
### Inference
```bash
python generate.py best_model_curriculum.pth
```
### Testing
```bash
python test_recall.py best_model_curriculum.pth # Memory test
python inspect_reasoning.py # System 2 diagnostics
```
## Architecture
```
Bytes → Encoder (RoPE) → Hebbian Memory → Reasoning Loop → Local RNN → Bytes
(Patches) (Dynamic λ) (3 steps) (Autoregressive)
```
### Core Components
- **ByteLatentEncoder:** Patches bytes into latent vectors with RoPE
- **HebbianMemory:** Fast weights with learnable decay + neuroplasticity (α)
- **RecurrentReasoningBlock:** 3-step iterative thinking loop (System 2)
- **LocalAutoregressiveHead:** GRU-based byte decoder
See [docs/architecture.md](docs/architecture.md) for technical details.
## Features
**No Tokenization** - Universal byte-level processing
**Linear Complexity** - O(N) attention with Hebbian memory
**Neuroplasticity** - Dynamic memory consolidation (α: 0.1 → 0.99)
**Curriculum Learning** - 3-stage developmental training
**Active Reasoning** - Verified thinking loop (Δz = 12.7)
**AMP Compatible** - Mixed precision training with stability fixes
## Curriculum Learning (Phase 7)
### Training Stages
| Stage | Steps | Plasticity (α) | Data | Purpose |
|-------|-------|----------------|------|---------|
| **1. Childhood** | 0-3K | 0.10 | Dictionary | Lexical grounding |
| **2. Youth** | 3K-8K | 0.50 | Stories | Syntactic scaffolding |
| **3. Adulthood** | 8K-20K | 0.99 | Wikipedia | Semantic expansion |
### Results (20K Steps - Turkish Training)
**Metrics:**
- **Final BPC:** 1.85 (↓77% from initialization)
- **Best Val BPC:** 1.78
- **Training Time:** ~50 minutes (CUDA GPU)
- **Stability:** 0 NaN occurrences across 20K steps
**Progress:**
```
Step 0: BPC = 8.04 (Random initialization)
Step 5K: BPC = 2.23 (Initial curriculum complete)
Step 10K: BPC = 1.98 (Mid-training)
Step 20K: BPC = 1.85 (Final)
```
**Improvement:** **6.19 BPC reduction** (77% improvement)
## Critical Fix: AMP Stability
**Problem:** Float16 overflow in Hebbian Memory with low plasticity (α=0.1)
**Solution:** Force float32 computation for memory module
```python
@torch.amp.autocast('cuda', enabled=False)
def forward(self, x):
x = x.float() # Bypass AMP for numerical stability
# ... Hebbian computation ...
return out.to(input_dtype)
```
This fix enables stable 20K+ step training with AMP enabled.
## Documentation
- [Architecture Guide](docs/architecture.md) - Technical deep dive
- [Training Guide](docs/training.md) - Training from scratch
- [Inference Guide](docs/inference.md) - Generation and sampling
- [API Reference](docs/api.md) - Code documentation
- [RFC 007: Curriculum Learning](docs/RFC_007_Curriculum_Learning.md) - Phase 7 design
## Model Files
- `best_model_curriculum.pth` - Best checkpoint (Val BPC: 1.78)
- `last_model_curriculum.pth` - Final model state (20K steps)
- `metrics_curriculum.json` - Full training metrics
## Next Steps
### Recommended Improvements
1. **Extended Training:** 30K-50K steps for further convergence
2. **Larger Model:** Increase d_model=768, n_layers=8
3. **Longer Context:** Extend to 2048 token window
4. **Fine-tuning:** Domain-specific Turkish datasets
### Research Directions
- Adaptive plasticity scheduling
- Multi-stage curriculum optimization
- Cross-lingual transfer learning
- Sparse Hebbian memory
## Citation
```bibtex
@software{agiformer2025,
title={AGIFORMER: Byte-Level Language Model with Hebbian Memory and Neuroplasticity},
author={inkbytefo},
year={2025},
note={Phase 7: Curriculum Learning with Dynamic Plasticity},
url={https://github.com/inkbytefo/agi-former}
}
```
## License
MIT License - see [LICENSE](LICENSE) file for details.
## Acknowledgments
- Built with PyTorch
- Turkish Wikipedia dataset (trwiki)
- Turkish Dictionary dataset (TDK)
- Inspired by Fast Weights, Linear Transformers, and developmental neuroscience
---
**Developer:** inkbytefo
**Phase:** 7 (Curriculum Learning & Neuroplasticity)
**Status:** Production Ready ✅
**Last Updated:** 2025-11-23