Update README.md
Browse files
README.md
CHANGED
|
@@ -45,7 +45,7 @@ The model was trained using the following setup:
|
|
| 45 |
- **Weight Decay:** 0.05
|
| 46 |
- **Batch Size:** 2048 sequences
|
| 47 |
- **Sequence Length:** 2048 tokens
|
| 48 |
-
- **Total Training Tokens:** 2.
|
| 49 |
- **Hardware:** Trained on H100 GPUs
|
| 50 |
|
| 51 |
For more detailed training information, please refer to Section 3.4 and Appendix F of the DCLM paper.
|
|
|
|
| 45 |
- **Weight Decay:** 0.05
|
| 46 |
- **Batch Size:** 2048 sequences
|
| 47 |
- **Sequence Length:** 2048 tokens
|
| 48 |
+
- **Total Training Tokens:** 2.6T
|
| 49 |
- **Hardware:** Trained on H100 GPUs
|
| 50 |
|
| 51 |
For more detailed training information, please refer to Section 3.4 and Appendix F of the DCLM paper.
|