File size: 447 Bytes
1883427 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | ---
language: en
license: mit
library_name: pytorch
tags: [nanochat, gpt, pretraining]
---
# nanochat-d12-step87k
286M parameter GPT model trained with nanochat framework.
- **Architecture**: 12 layers, 768 dim, 6 heads, RoPE, GQA, ReLU² MLP
- **Context**: 2048 tokens, full attention (window_pattern=L)
- **Training**: 87,000 steps, ~5.7B tokens, Chinchilla-optimal (ratio=12)
- **Val bpb**: 0.8658
- **GPU**: RTX 4070 12GB, bf16, 28.4 hours
|