NanoGPT-OOM
Submission for the CSE 251B NanoGPT contest.
- Public validation perplexity:
28.6210 - Public validation loss:
3.354141 - Parameters:
98,939,904 - Training data: FineWeb-Edu
sample-10BT, followed by a short mixed-data continuation
Repository contents:
checkpoint.pt: trained model checkpointmodel.py: model definition with the requiredload_model(checkpoint_path, device)interface
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support