slm_julius_caesar / README.md
samratkar's picture
Update README.md
1da92ee verified
metadata
license: apache-2.0

Source code

This is a small language model trained on the Julius caeser text. The source code of the model is available at - Link to the codebase

The model configurations are as follows -

Model configuration

  1. learning rate = 1e-4
  2. max iters = 10000
  3. warmup steps = 2000
  4. min lr = 5e-4
  5. eval iters = 500
  6. batch size = 8
  7. block size = 128
  8. vocab size=50257
  9. block size=128
  10. number of layers=4
  11. number of heads=4
  12. embedding dimension=768
  13. dropout=0.01
  14. bias=True

Test Vs Validation loss