Update README.md
Browse files
README.md
CHANGED
|
@@ -15,7 +15,7 @@ Doge uses `wsd_scheduler` as the training scheduler, which divides the learning
|
|
| 15 |
|
| 16 |
Here are the initial learning rates required to continue training at each checkpoint:
|
| 17 |
|
| 18 |
-
- **[Doge-40M](https://huggingface.co/SmallDoge/Doge-40M-checkpoint)
|
| 19 |
- [Doge-40M-MoE](https://huggingface.co/SmallDoge/Doge-40M-MoE-checkpoint): 8e-3
|
| 20 |
|
| 21 |
|
|
|
|
| 15 |
|
| 16 |
Here are the initial learning rates required to continue training at each checkpoint:
|
| 17 |
|
| 18 |
+
- **[Doge-40M](https://huggingface.co/SmallDoge/Doge-40M-checkpoint): 8e-3**
|
| 19 |
- [Doge-40M-MoE](https://huggingface.co/SmallDoge/Doge-40M-MoE-checkpoint): 8e-3
|
| 20 |
|
| 21 |
|