gpjt commited on
Commit
273db09
·
verified ·
1 Parent(s): ec485b9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -3
README.md CHANGED
@@ -40,7 +40,7 @@ LLM, please do feel free to play with it!
40
  ### Model Sources
41
 
42
  - **Repository:** [gpjt/ddp-base-model-from-scratch](https://github.com/gpjt/ddp-base-model-from-scratch)
43
- - **Blog post:** [Writing an LLM from scratch, part 29 -- using DistributedDataParallel to train a base model from scratch in the cloud](https://www.gilesthomas.com/2026/01/llm-from-scratch-29-ddp-training-a-base-model-in-the-cloud)
44
 
45
  ## How to Get Started with the Model
46
 
@@ -78,9 +78,10 @@ number of tokens. It's [both dumb and ignorant](https://www.gilesthomas.com/202
78
 
79
  ## Training Details
80
 
81
- - **Machine type:** TODO
82
  - **Tokens:** 3,260,190,720 (Chinchilla-optimal of 20x parameters) rounded up to the nearest batch.
83
  - **Dataset:** [gpjt/fineweb-gpt2-tokens](https://huggingface.co/datasets/gpjt/fineweb-gpt2-tokens)
84
  - **Micro-batch size:** 12
85
- - **Global batch size:** TODO
86
  - **Dropout:** 0.1
 
 
40
  ### Model Sources
41
 
42
  - **Repository:** [gpjt/ddp-base-model-from-scratch](https://github.com/gpjt/ddp-base-model-from-scratch)
43
+ - **Blog post:** [Writing an LLM from scratch, part 32b -- Interventions: gradient clipping](https://www.gilesthomas.com/2026/02/llm-from-scratch-32b-interventions-gradient-clipping)
44
 
45
  ## How to Get Started with the Model
46
 
 
78
 
79
  ## Training Details
80
 
81
+ - **Machine type:** 8a A100 with 40GiB per GPU
82
  - **Tokens:** 3,260,190,720 (Chinchilla-optimal of 20x parameters) rounded up to the nearest batch.
83
  - **Dataset:** [gpjt/fineweb-gpt2-tokens](https://huggingface.co/datasets/gpjt/fineweb-gpt2-tokens)
84
  - **Micro-batch size:** 12
85
+ - **Global batch size:** 96
86
  - **Dropout:** 0.1
87
+ - **Gradient clipping**: at 3.5