JingzeShi commited on
Commit
dccc3c8
·
verified ·
1 Parent(s): 14c0160

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -12,11 +12,11 @@ pipeline_tag: text-generation
12
 
13
  ![wsd_scheduler](./wsd_scheduler.png)
14
 
15
- Doge uses `wsd_scheduler` as the training scheduler, which divides the learning rate into three stages: warmup, stable, and decay. It allows us to continue training from any checkpoint in the stable stage without causing loss rebound.
16
 
17
  Here are the initial learning rates required to continue training at each checkpoint:
18
 
19
- - `Doge-20M`: 8e-3
20
- - `Doge-60M`: 6e-3
21
- - `Doge-160M`: 4e-3
22
- - `Doge-320M`: 2e-3
 
12
 
13
  ![wsd_scheduler](./wsd_scheduler.png)
14
 
15
+ Doge uses `wsd_scheduler` as the training scheduler, which divides the learning rate into three stages: `warmup`, `stable`, and `decay`. It allows us to continue training from any checkpoint in the `stable stage` without causing loss rebound.
16
 
17
  Here are the initial learning rates required to continue training at each checkpoint:
18
 
19
+ - **Doge-20M**: 8e-3
20
+ - **Doge-60M**: 6e-3
21
+ - **Doge-160M**: 4e-3
22
+ - **Doge-320M**: 2e-3