Source code

This is a small language model trained on the Julius caeser text. The source code of the model is available at - Link to the codebase

The model configurations are as follows -

Model configuration

  1. learning rate = 1e-4
  2. max iters = 10000
  3. warmup steps = 2000
  4. min lr = 5e-4
  5. eval iters = 500
  6. batch size = 8
  7. block size = 128
  8. vocab size=50257
  9. block size=128
  10. number of layers=4
  11. number of heads=4
  12. embedding dimension=768
  13. dropout=0.01
  14. bias=True

Test Vs Validation loss

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support