gpjt commited on
Commit
53ea651
·
verified ·
1 Parent(s): 87db882

Updated TODOs.

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -39,7 +39,7 @@ LLM, please do feel free to play with it!
39
 
40
  ### Model Sources
41
 
42
- - **Repository:** [gpjt/ddp-base-model-from-scratch](https://github.com/gpjt/ddp-base-model-from-scratch)
43
  - **Blog post:** [Writing an LLM from scratch, part 29 -- using DistributedDataParallel to train a base model from scratch in the cloud](https://www.gilesthomas.com/2026/01/llm-from-scratch-29-ddp-training-a-base-model-in-the-cloud)
44
 
45
  ## How to Get Started with the Model
@@ -78,9 +78,9 @@ number of tokens. It's [both dumb and ignorant](https://www.gilesthomas.com/202
78
 
79
  ## Training Details
80
 
81
- - **Machine type:** TODO
82
  - **Tokens:** 3,260,190,720 (Chinchilla-optimal of 20x parameters) rounded up to the nearest batch.
83
  - **Dataset:** [gpjt/fineweb-gpt2-tokens](https://huggingface.co/datasets/gpjt/fineweb-gpt2-tokens)
84
  - **Micro-batch size:** 13
85
- - **Global batch size:** TODO
86
  - **Dropout:** 0.1
 
39
 
40
  ### Model Sources
41
 
42
+ - **Repository:** [gpjt/ddp-base-model-from-scratch](https://github.com/gpjt/ddp-base-model-from-scratch) (this is the model from "First train on a big instance: 8x A100, 40 GiB/GPU, SXM4")
43
  - **Blog post:** [Writing an LLM from scratch, part 29 -- using DistributedDataParallel to train a base model from scratch in the cloud](https://www.gilesthomas.com/2026/01/llm-from-scratch-29-ddp-training-a-base-model-in-the-cloud)
44
 
45
  ## How to Get Started with the Model
 
78
 
79
  ## Training Details
80
 
81
+ - **Machine type:** 8x A100, 40 GiB/GPU, SXM4
82
  - **Tokens:** 3,260,190,720 (Chinchilla-optimal of 20x parameters) rounded up to the nearest batch.
83
  - **Dataset:** [gpjt/fineweb-gpt2-tokens](https://huggingface.co/datasets/gpjt/fineweb-gpt2-tokens)
84
  - **Micro-batch size:** 13
85
+ - **Global batch size:** 104
86
  - **Dropout:** 0.1