SmallDoge
/

Doge-20M-checkpoint

Text Generation

Model card Files Files and versions

JingzeShi commited on Jan 16, 2025

Commit

dccc3c8

·

verified ·

1 Parent(s): 14c0160

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -12,11 +12,11 @@ pipeline_tag: text-generation
 ![wsd_scheduler](./wsd_scheduler.png)
-Doge uses `wsd_scheduler` as the training scheduler, which divides the learning rate into three stages: warmup, stable, and decay. It allows us to continue training from any checkpoint in the stable stage without causing loss rebound.
 Here are the initial learning rates required to continue training at each checkpoint:
-- `Doge-20M`: 8e-3
-- `Doge-60M`: 6e-3
-- `Doge-160M`: 4e-3
-- `Doge-320M`: 2e-3

 ![wsd_scheduler](./wsd_scheduler.png)
+Doge uses `wsd_scheduler` as the training scheduler, which divides the learning rate into three stages: `warmup`, `stable`, and `decay`. It allows us to continue training from any checkpoint in the `stable stage` without causing loss rebound.
 Here are the initial learning rates required to continue training at each checkpoint:
+- **Doge-20M**: 8e-3
+- **Doge-60M**: 6e-3
+- **Doge-160M**: 4e-3
+- **Doge-320M**: 2e-3