NanoGPT-OOM

Submission for the CSE 251B NanoGPT contest.

  • Public validation perplexity: 28.6210
  • Public validation loss: 3.354141
  • Parameters: 98,939,904
  • Training data: FineWeb-Edu sample-10BT, followed by a short mixed-data continuation

Repository contents:

  • checkpoint.pt: trained model checkpoint
  • model.py: model definition with the required load_model(checkpoint_path, device) interface
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support