CSE 251B NanoGPT Baseline (Day 3)

Day-3 baseline checkpoint for UCSD CSE 251B Spring 2026 NanoGPT competition.

  • 50.93M params (n_layer=8, n_head=8, n_embd=512, block_size=1024)
  • Trained on FineWeb-Edu (~983M tokens, 10k iters, AdamW, cosine LR 6e-4 → 6e-5)
  • val PPL on contest val.bin: 41.72

This is a snapshot for the milestone report; not the final submission.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support