gpjt
/

8xa100m40

@@ -39,7 +39,7 @@ LLM, please do feel free to play with it!
 ### Model Sources
-- **Repository:** [gpjt/ddp-base-model-from-scratch](https://github.com/gpjt/ddp-base-model-from-scratch)
 - **Blog post:** [Writing an LLM from scratch, part 29 -- using DistributedDataParallel to train a base model from scratch in the cloud](https://www.gilesthomas.com/2026/01/llm-from-scratch-29-ddp-training-a-base-model-in-the-cloud)
 ## How to Get Started with the Model
@@ -78,9 +78,9 @@ number of tokens.  It's [both dumb and ignorant](https://www.gilesthomas.com/202
 ## Training Details
-- **Machine type:** TODO
 - **Tokens:**  3,260,190,720 (Chinchilla-optimal of 20x parameters) rounded up to the nearest batch.
 - **Dataset:** [gpjt/fineweb-gpt2-tokens](https://huggingface.co/datasets/gpjt/fineweb-gpt2-tokens)
 - **Micro-batch size:** 13
-- **Global batch size:** TODO
 - **Dropout:** 0.1

 ### Model Sources
+- **Repository:** [gpjt/ddp-base-model-from-scratch](https://github.com/gpjt/ddp-base-model-from-scratch) (this is the model from "First train on a big instance: 8x A100, 40 GiB/GPU, SXM4")
 - **Blog post:** [Writing an LLM from scratch, part 29 -- using DistributedDataParallel to train a base model from scratch in the cloud](https://www.gilesthomas.com/2026/01/llm-from-scratch-29-ddp-training-a-base-model-in-the-cloud)
 ## How to Get Started with the Model
 ## Training Details
+- **Machine type:** 8x A100, 40 GiB/GPU, SXM4
 - **Tokens:**  3,260,190,720 (Chinchilla-optimal of 20x parameters) rounded up to the nearest batch.
 - **Dataset:** [gpjt/fineweb-gpt2-tokens](https://huggingface.co/datasets/gpjt/fineweb-gpt2-tokens)
 - **Micro-batch size:** 13
+- **Global batch size:** 104
 - **Dropout:** 0.1