pranikz commited on
Commit
3f5dcff
Β·
verified Β·
1 Parent(s): e5d678d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +35 -1
README.md CHANGED
@@ -1,5 +1,12 @@
1
  ---
2
  license: mit
 
 
 
 
 
 
 
3
  ---
4
  # πŸš€ Small Language Model (SLM) from Scratch β€” Explained
5
 
@@ -222,6 +229,33 @@ plt.show()
222
 
223
  ---
224
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
225
  ## 6. Inference
226
 
227
  ```python
@@ -251,4 +285,4 @@ For practical testing, use **200–500 tokens**.
251
  - **Evaluation**: Loss curves (train vs val).
252
  - **Inference**: Autoregressive generation with temperature & top-k control.
253
 
254
- This is essentially a **mini GPT-2 clone**, scaled down for small datasets like movie scripts.
 
1
  ---
2
  license: mit
3
+ datasets:
4
+ - IsmaelMousa/movies
5
+ tags:
6
+ - movie
7
+ - short_stories
8
+ - llm
9
+ - slm
10
  ---
11
  # πŸš€ Small Language Model (SLM) from Scratch β€” Explained
12
 
 
229
 
230
  ---
231
 
232
+ ## πŸ“Š Training Metrics
233
+
234
+ | Epoch | Train Loss | Val Loss | Perplexity |
235
+ |-------|------------|----------|------------|
236
+ | 500 | 6.0358 | 6.0601 | 430.1 |
237
+ | 1000 | 5.0690 | 5.1143 | 166.0 |
238
+ | 1500 | 4.3162 | 4.3407 | 76.7 |
239
+ | 2000 | 3.5948 | 3.6099 | 36.9 |
240
+ | 2500 | 3.0460 | 3.0569 | 21.3 |
241
+ | 3000 | 2.7518 | 2.7398 | 15.5 |
242
+ | 3500 | 2.5606 | 2.5574 | 12.9 |
243
+ | 4000 | 2.4583 | 2.4691 | 11.8 |
244
+ | 4500 | 2.3943 | 2.3969 | 11.0 |
245
+ | 5000 | 2.3428 | 2.3513 | 10.5 |
246
+ | 6000 | 2.2141 | 2.2155 | 9.17 |
247
+ | 7000 | 2.1389 | 2.1577 | 8.65 |
248
+ | 8000 | 2.0570 | 2.0703 | 7.93 |
249
+ | 9000 | 2.0062 | 2.0210 | 7.55 |
250
+ | 10000 | 1.9604 | 1.9715 | 7.18 |
251
+ | 12000 | 1.8580 | 1.8924 | 6.64 |
252
+ | 14000 | 1.7954 | 1.8284 | 6.23 |
253
+ | 16000 | 1.7369 | 1.7937 | 5.95 |
254
+ | 18000 | 1.6901 | 1.7314 | 5.65 |
255
+ | 19500 | 1.6594 | 1.7216 | 5.60 |
256
+
257
+ πŸ“‰ Validation loss steadily decreases, and **perplexity drops from ~430 β†’ ~5.6** over training.
258
+
259
  ## 6. Inference
260
 
261
  ```python
 
285
  - **Evaluation**: Loss curves (train vs val).
286
  - **Inference**: Autoregressive generation with temperature & top-k control.
287
 
288
+ This is essentially a **mini GPT-2 clone**, scaled down for small datasets like movie scripts.