SmallDoge
/

Doge-60M-checkpoint

Text Generation

Model card Files Files and versions

JingzeShi commited on Jan 31, 2025

Commit

1f9da74

·

verified ·

1 Parent(s): 089ab87

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -16,9 +16,9 @@ Doge uses `wsd_scheduler` as the training scheduler, which divides the learning
 Here are the initial learning rates required to continue training at each checkpoint:
-- **[Doge-20M](https://huggingface.co/JingzeShi/Doge-20M-checkpoint)**: 8e-3
-- **[Doge-60M](https://huggingface.co/JingzeShi/Doge-60M-checkpoint)**: 6e-3
-- **Doge-160M**: 4e-3
 - **Doge-320M**: 2e-3
 | Model | Learning Rate | Schedule | Warmup Steps | Stable Steps |

 Here are the initial learning rates required to continue training at each checkpoint:
+- **[Doge-20M](https://huggingface.co/SmallDoge/Doge-20M-checkpoint)**: 8e-3
+- **[Doge-60M](https://huggingface.co/SmallDoge/Doge-60M-checkpoint)**: 6e-3
+- **[Doge-160M]((https://huggingface.co/SmallDoge/Doge-160M-checkpoint))**: 4e-3
 - **Doge-320M**: 2e-3
 | Model | Learning Rate | Schedule | Warmup Steps | Stable Steps |