SmallDoge
/

Doge-20M-checkpoint

Text Generation

Model card Files Files and versions

JingzeShi commited on Jan 20, 2025

Commit

6af69e5

·

verified ·

1 Parent(s): dccc3c8

Update README.md

Files changed (1) hide show

README.md +10 -3

README.md CHANGED Viewed

@@ -16,7 +16,14 @@ Doge uses `wsd_scheduler` as the training scheduler, which divides the learning
 Here are the initial learning rates required to continue training at each checkpoint:
-- **Doge-20M**: 8e-3
-- **Doge-60M**: 6e-3
 - **Doge-160M**: 4e-3
-- **Doge-320M**: 2e-3

 Here are the initial learning rates required to continue training at each checkpoint:
+- **[Doge-20M](https://huggingface.co/JingzeShi/Doge-20M-checkpoint)**: 8e-3
+- **[Doge-60M](https://huggingface.co/JingzeShi/Doge-60M-checkpoint)**: 6e-3
 - **Doge-160M**: 4e-3
+- **Doge-320M**: 2e-3
+| Model | Learning Rate | Schedule | Warmup Steps | Stable Steps |
+|-------|---------------|----------|--------------|--------------|
+| Doge-20M | 8e-3 | wsd_scheduler | 800 | 6400 |
+| Doge-60M | 6e-3 | wsd_scheduler | 1600 | 12800 |
+| Doge-160M | 4e-3 | wsd_scheduler | 2400 | 19200 |
+| Doge-320M | 2e-3 | wsd_scheduler | 3200 | 25600 |