Updated TODOs.
Browse files
README.md
CHANGED
|
@@ -39,7 +39,7 @@ LLM, please do feel free to play with it!
|
|
| 39 |
|
| 40 |
### Model Sources
|
| 41 |
|
| 42 |
-
- **Repository:** [gpjt/ddp-base-model-from-scratch](https://github.com/gpjt/ddp-base-model-from-scratch)
|
| 43 |
- **Blog post:** [Writing an LLM from scratch, part 29 -- using DistributedDataParallel to train a base model from scratch in the cloud](https://www.gilesthomas.com/2026/01/llm-from-scratch-29-ddp-training-a-base-model-in-the-cloud)
|
| 44 |
|
| 45 |
## How to Get Started with the Model
|
|
@@ -78,9 +78,9 @@ number of tokens. It's [both dumb and ignorant](https://www.gilesthomas.com/202
|
|
| 78 |
|
| 79 |
## Training Details
|
| 80 |
|
| 81 |
-
- **Machine type:**
|
| 82 |
- **Tokens:** 3,260,190,720 (Chinchilla-optimal of 20x parameters) rounded up to the nearest batch.
|
| 83 |
- **Dataset:** [gpjt/fineweb-gpt2-tokens](https://huggingface.co/datasets/gpjt/fineweb-gpt2-tokens)
|
| 84 |
- **Micro-batch size:** 13
|
| 85 |
-
- **Global batch size:**
|
| 86 |
- **Dropout:** 0.1
|
|
|
|
| 39 |
|
| 40 |
### Model Sources
|
| 41 |
|
| 42 |
+
- **Repository:** [gpjt/ddp-base-model-from-scratch](https://github.com/gpjt/ddp-base-model-from-scratch) (this is the model from "First train on a big instance: 8x A100, 40 GiB/GPU, SXM4")
|
| 43 |
- **Blog post:** [Writing an LLM from scratch, part 29 -- using DistributedDataParallel to train a base model from scratch in the cloud](https://www.gilesthomas.com/2026/01/llm-from-scratch-29-ddp-training-a-base-model-in-the-cloud)
|
| 44 |
|
| 45 |
## How to Get Started with the Model
|
|
|
|
| 78 |
|
| 79 |
## Training Details
|
| 80 |
|
| 81 |
+
- **Machine type:** 8x A100, 40 GiB/GPU, SXM4
|
| 82 |
- **Tokens:** 3,260,190,720 (Chinchilla-optimal of 20x parameters) rounded up to the nearest batch.
|
| 83 |
- **Dataset:** [gpjt/fineweb-gpt2-tokens](https://huggingface.co/datasets/gpjt/fineweb-gpt2-tokens)
|
| 84 |
- **Micro-batch size:** 13
|
| 85 |
+
- **Global batch size:** 104
|
| 86 |
- **Dropout:** 0.1
|