Update README.md
Browse files
README.md
CHANGED
|
@@ -12,7 +12,7 @@ pipeline_tag: text-generation
|
|
| 12 |
|
| 13 |

|
| 14 |
|
| 15 |
-
Doge uses `wsd_scheduler` as the training scheduler, which divides the learning rate into three stages: `warmup`, `stable`, and `decay`. It allows us to continue training from any checkpoint in the `stable stage` without
|
| 16 |
|
| 17 |
Here are the initial learning rates required to continue training at each checkpoint:
|
| 18 |
|
|
|
|
| 12 |
|
| 13 |

|
| 14 |
|
| 15 |
+
Doge uses `wsd_scheduler` as the training scheduler, which divides the learning rate into three stages: `warmup`, `stable`, and `decay`. It allows us to continue training on any new dataset from any checkpoint in the `stable stage` without spikes of the training.
|
| 16 |
|
| 17 |
Here are the initial learning rates required to continue training at each checkpoint:
|
| 18 |
|