| language: en | |
| license: mit | |
| library_name: pytorch | |
| tags: [nanochat, gpt, pretraining] | |
| # nanochat-d12-step87k | |
| 286M parameter GPT model trained with nanochat framework. | |
| - **Architecture**: 12 layers, 768 dim, 6 heads, RoPE, GQA, ReLU² MLP | |
| - **Context**: 2048 tokens, full attention (window_pattern=L) | |
| - **Training**: 87,000 steps, ~5.7B tokens, Chinchilla-optimal (ratio=12) | |
| - **Val bpb**: 0.8658 | |
| - **GPU**: RTX 4070 12GB, bf16, 28.4 hours | |