Update README.md
Browse files
README.md
CHANGED
|
@@ -9,4 +9,19 @@ language:
|
|
| 9 |
- en
|
| 10 |
---
|
| 11 |
|
| 12 |
-
3.2B parameter base model trained for ~64B tokens from the FineWeb dataset
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
- en
|
| 10 |
---
|
| 11 |
|
| 12 |
+
3.2B parameter base model trained for ~64B tokens from the FineWeb dataset
|
| 13 |
+
|
| 14 |
+
uses gpt2 tokenizer from tiktoken
|
| 15 |
+
|
| 16 |
+
[wandb training metrics](https://api.wandb.ai/links/teammapo-mapo-labs/zooq3iig)
|
| 17 |
+
- note: increased batch size from 8 to 512 at step 2,160,000
|
| 18 |
+
- Final checkpoint: step 2,187,000, val_loss: 2.7489
|
| 19 |
+
- Trained on a 8xH100 80GB node using data parallel
|
| 20 |
+
|
| 21 |
+
```
|
| 22 |
+
"d_head": 128,
|
| 23 |
+
"d_model": 8192,
|
| 24 |
+
"n_heads": 64,
|
| 25 |
+
"n_layers": 3,
|
| 26 |
+
"n_vocab": 50257
|
| 27 |
+
```
|