CSE 251B NanoGPT Baseline (Day 3)
Day-3 baseline checkpoint for UCSD CSE 251B Spring 2026 NanoGPT competition.
- 50.93M params (n_layer=8, n_head=8, n_embd=512, block_size=1024)
- Trained on FineWeb-Edu (~983M tokens, 10k iters, AdamW, cosine LR 6e-4 → 6e-5)
- val PPL on contest val.bin: 41.72
This is a snapshot for the milestone report; not the final submission.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support