Update README.md
Browse files## Training Vs Validation loss function plot

README.md
CHANGED
|
@@ -1,3 +1,23 @@
|
|
| 1 |
-
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
---
|
| 4 |
+
## Source code
|
| 5 |
+
This is a small language model trained on the Julius caeser text. The source code of the model is available at -
|
| 6 |
+
[Link to the codebase](https://colab.research.google.com/github/samratkar/samratkar.github.io/blob/main/_posts/concepts/genai/notes-codes/slm-from-scratch/slm-jc.ipynb)
|
| 7 |
+
|
| 8 |
+
The model configurations are as follows -
|
| 9 |
+
## Model configuration
|
| 10 |
+
1. learning rate = 1e-4
|
| 11 |
+
2. max iters = 10000
|
| 12 |
+
3. warmup steps = 2000
|
| 13 |
+
4. min lr = 5e-4
|
| 14 |
+
5. eval iters = 500
|
| 15 |
+
6. batch size = 8
|
| 16 |
+
7. block size = 128
|
| 17 |
+
8. vocab size=50257
|
| 18 |
+
9. block size=128
|
| 19 |
+
10. number of layers=4
|
| 20 |
+
11. number of heads=4
|
| 21 |
+
12. embedding dimension=768
|
| 22 |
+
13. dropout=0.01
|
| 23 |
+
14. bias=True
|