--- language: en license: mit library_name: pytorch tags: [nanochat, gpt, pretraining] --- # nanochat-d12-step87k 286M parameter GPT model trained with nanochat framework. - **Architecture**: 12 layers, 768 dim, 6 heads, RoPE, GQA, ReLU² MLP - **Context**: 2048 tokens, full attention (window_pattern=L) - **Training**: 87,000 steps, ~5.7B tokens, Chinchilla-optimal (ratio=12) - **Val bpb**: 0.8658 - **GPU**: RTX 4070 12GB, bf16, 28.4 hours