Update README.md
Browse files
README.md
CHANGED
|
@@ -18,6 +18,7 @@ uses gpt2 tokenizer from tiktoken
|
|
| 18 |
- Final checkpoint: step 2,187,000, val_loss: 2.7489
|
| 19 |
- Trained on a 8xH100 80GB node using data parallel
|
| 20 |
|
|
|
|
| 21 |
```
|
| 22 |
"d_head": 128,
|
| 23 |
"d_model": 8192,
|
|
|
|
| 18 |
- Final checkpoint: step 2,187,000, val_loss: 2.7489
|
| 19 |
- Trained on a 8xH100 80GB node using data parallel
|
| 20 |
|
| 21 |
+
Model config:
|
| 22 |
```
|
| 23 |
"d_head": 128,
|
| 24 |
"d_model": 8192,
|