Update README.md
Browse files
README.md
CHANGED
|
@@ -1,5 +1,12 @@
|
|
| 1 |
---
|
| 2 |
license: mit
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
---
|
| 4 |
# π Small Language Model (SLM) from Scratch β Explained
|
| 5 |
|
|
@@ -222,6 +229,33 @@ plt.show()
|
|
| 222 |
|
| 223 |
---
|
| 224 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 225 |
## 6. Inference
|
| 226 |
|
| 227 |
```python
|
|
@@ -251,4 +285,4 @@ For practical testing, use **200β500 tokens**.
|
|
| 251 |
- **Evaluation**: Loss curves (train vs val).
|
| 252 |
- **Inference**: Autoregressive generation with temperature & top-k control.
|
| 253 |
|
| 254 |
-
This is essentially a **mini GPT-2 clone**, scaled down for small datasets like movie scripts.
|
|
|
|
| 1 |
---
|
| 2 |
license: mit
|
| 3 |
+
datasets:
|
| 4 |
+
- IsmaelMousa/movies
|
| 5 |
+
tags:
|
| 6 |
+
- movie
|
| 7 |
+
- short_stories
|
| 8 |
+
- llm
|
| 9 |
+
- slm
|
| 10 |
---
|
| 11 |
# π Small Language Model (SLM) from Scratch β Explained
|
| 12 |
|
|
|
|
| 229 |
|
| 230 |
---
|
| 231 |
|
| 232 |
+
## π Training Metrics
|
| 233 |
+
|
| 234 |
+
| Epoch | Train Loss | Val Loss | Perplexity |
|
| 235 |
+
|-------|------------|----------|------------|
|
| 236 |
+
| 500 | 6.0358 | 6.0601 | 430.1 |
|
| 237 |
+
| 1000 | 5.0690 | 5.1143 | 166.0 |
|
| 238 |
+
| 1500 | 4.3162 | 4.3407 | 76.7 |
|
| 239 |
+
| 2000 | 3.5948 | 3.6099 | 36.9 |
|
| 240 |
+
| 2500 | 3.0460 | 3.0569 | 21.3 |
|
| 241 |
+
| 3000 | 2.7518 | 2.7398 | 15.5 |
|
| 242 |
+
| 3500 | 2.5606 | 2.5574 | 12.9 |
|
| 243 |
+
| 4000 | 2.4583 | 2.4691 | 11.8 |
|
| 244 |
+
| 4500 | 2.3943 | 2.3969 | 11.0 |
|
| 245 |
+
| 5000 | 2.3428 | 2.3513 | 10.5 |
|
| 246 |
+
| 6000 | 2.2141 | 2.2155 | 9.17 |
|
| 247 |
+
| 7000 | 2.1389 | 2.1577 | 8.65 |
|
| 248 |
+
| 8000 | 2.0570 | 2.0703 | 7.93 |
|
| 249 |
+
| 9000 | 2.0062 | 2.0210 | 7.55 |
|
| 250 |
+
| 10000 | 1.9604 | 1.9715 | 7.18 |
|
| 251 |
+
| 12000 | 1.8580 | 1.8924 | 6.64 |
|
| 252 |
+
| 14000 | 1.7954 | 1.8284 | 6.23 |
|
| 253 |
+
| 16000 | 1.7369 | 1.7937 | 5.95 |
|
| 254 |
+
| 18000 | 1.6901 | 1.7314 | 5.65 |
|
| 255 |
+
| 19500 | 1.6594 | 1.7216 | 5.60 |
|
| 256 |
+
|
| 257 |
+
π Validation loss steadily decreases, and **perplexity drops from ~430 β ~5.6** over training.
|
| 258 |
+
|
| 259 |
## 6. Inference
|
| 260 |
|
| 261 |
```python
|
|
|
|
| 285 |
- **Evaluation**: Loss curves (train vs val).
|
| 286 |
- **Inference**: Autoregressive generation with temperature & top-k control.
|
| 287 |
|
| 288 |
+
This is essentially a **mini GPT-2 clone**, scaled down for small datasets like movie scripts.
|