nanochat-d12 / README.md
lzwjava's picture
feat: nanochat d12 step 87k (286M params, val_bpb=0.866)
1883427 verified
metadata
language: en
license: mit
library_name: pytorch
tags:
  - nanochat
  - gpt
  - pretraining

nanochat-d12-step87k

286M parameter GPT model trained with nanochat framework.

  • Architecture: 12 layers, 768 dim, 6 heads, RoPE, GQA, ReLU² MLP
  • Context: 2048 tokens, full attention (window_pattern=L)
  • Training: 87,000 steps, ~5.7B tokens, Chinchilla-optimal (ratio=12)
  • Val bpb: 0.8658
  • GPU: RTX 4070 12GB, bf16, 28.4 hours