|
|
--- |
|
|
license: apache-2.0 |
|
|
--- |
|
|
## Source code |
|
|
This is a small language model trained on the Julius caeser text. The source code of the model is available at - |
|
|
[Link to the codebase](https://colab.research.google.com/github/samratkar/samratkar.github.io/blob/main/_posts/concepts/genai/notes-codes/slm-from-scratch/slm-jc.ipynb) |
|
|
|
|
|
The model configurations are as follows - |
|
|
## Model configuration |
|
|
1. learning rate = 1e-4 |
|
|
2. max iters = 10000 |
|
|
3. warmup steps = 2000 |
|
|
4. min lr = 5e-4 |
|
|
5. eval iters = 500 |
|
|
6. batch size = 8 |
|
|
7. block size = 128 |
|
|
8. vocab size=50257 |
|
|
9. block size=128 |
|
|
10. number of layers=4 |
|
|
11. number of heads=4 |
|
|
12. embedding dimension=768 |
|
|
13. dropout=0.01 |
|
|
14. bias=True |
|
|
|
|
|
## Test Vs Validation loss |
|
|
 |