NanoGPT-OOM

Submission for the CSE 251B NanoGPT contest.

Public validation perplexity: 28.6210
Public validation loss: 3.354141
Parameters: 98,939,904
Training data: FineWeb-Edu sample-10BT, followed by a short mixed-data continuation

Repository contents:

checkpoint.pt: trained model checkpoint
model.py: model definition with the required load_model(checkpoint_path, device) interface

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support