michaelbzhu commited on
Commit
c5c82ce
·
verified ·
1 Parent(s): dae91fa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +16 -1
README.md CHANGED
@@ -9,4 +9,19 @@ language:
9
  - en
10
  ---
11
 
12
- 3.2B parameter base model trained for ~64B tokens from the FineWeb dataset
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  - en
10
  ---
11
 
12
+ 3.2B parameter base model trained for ~64B tokens from the FineWeb dataset
13
+
14
+ uses gpt2 tokenizer from tiktoken
15
+
16
+ [wandb training metrics](https://api.wandb.ai/links/teammapo-mapo-labs/zooq3iig)
17
+ - note: increased batch size from 8 to 512 at step 2,160,000
18
+ - Final checkpoint: step 2,187,000, val_loss: 2.7489
19
+ - Trained on a 8xH100 80GB node using data parallel
20
+
21
+ ```
22
+ "d_head": 128,
23
+ "d_model": 8192,
24
+ "n_heads": 64,
25
+ "n_layers": 3,
26
+ "n_vocab": 50257
27
+ ```