Update README.md
Browse files
README.md
CHANGED
|
@@ -40,7 +40,7 @@ LLM, please do feel free to play with it!
|
|
| 40 |
### Model Sources
|
| 41 |
|
| 42 |
- **Repository:** [gpjt/ddp-base-model-from-scratch](https://github.com/gpjt/ddp-base-model-from-scratch)
|
| 43 |
-
- **Blog post:** [Writing an LLM from scratch, part
|
| 44 |
|
| 45 |
## How to Get Started with the Model
|
| 46 |
|
|
@@ -78,9 +78,10 @@ number of tokens. It's [both dumb and ignorant](https://www.gilesthomas.com/202
|
|
| 78 |
|
| 79 |
## Training Details
|
| 80 |
|
| 81 |
-
- **Machine type:**
|
| 82 |
- **Tokens:** 3,260,190,720 (Chinchilla-optimal of 20x parameters) rounded up to the nearest batch.
|
| 83 |
- **Dataset:** [gpjt/fineweb-gpt2-tokens](https://huggingface.co/datasets/gpjt/fineweb-gpt2-tokens)
|
| 84 |
- **Micro-batch size:** 12
|
| 85 |
-
- **Global batch size:**
|
| 86 |
- **Dropout:** 0.1
|
|
|
|
|
|
| 40 |
### Model Sources
|
| 41 |
|
| 42 |
- **Repository:** [gpjt/ddp-base-model-from-scratch](https://github.com/gpjt/ddp-base-model-from-scratch)
|
| 43 |
+
- **Blog post:** [Writing an LLM from scratch, part 32b -- Interventions: gradient clipping](https://www.gilesthomas.com/2026/02/llm-from-scratch-32b-interventions-gradient-clipping)
|
| 44 |
|
| 45 |
## How to Get Started with the Model
|
| 46 |
|
|
|
|
| 78 |
|
| 79 |
## Training Details
|
| 80 |
|
| 81 |
+
- **Machine type:** 8a A100 with 40GiB per GPU
|
| 82 |
- **Tokens:** 3,260,190,720 (Chinchilla-optimal of 20x parameters) rounded up to the nearest batch.
|
| 83 |
- **Dataset:** [gpjt/fineweb-gpt2-tokens](https://huggingface.co/datasets/gpjt/fineweb-gpt2-tokens)
|
| 84 |
- **Micro-batch size:** 12
|
| 85 |
+
- **Global batch size:** 96
|
| 86 |
- **Dropout:** 0.1
|
| 87 |
+
- **Gradient clipping**: at 3.5
|