JingzeShi commited on
Commit
6af69e5
·
verified ·
1 Parent(s): dccc3c8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -3
README.md CHANGED
@@ -16,7 +16,14 @@ Doge uses `wsd_scheduler` as the training scheduler, which divides the learning
16
 
17
  Here are the initial learning rates required to continue training at each checkpoint:
18
 
19
- - **Doge-20M**: 8e-3
20
- - **Doge-60M**: 6e-3
21
  - **Doge-160M**: 4e-3
22
- - **Doge-320M**: 2e-3
 
 
 
 
 
 
 
 
16
 
17
  Here are the initial learning rates required to continue training at each checkpoint:
18
 
19
+ - **[Doge-20M](https://huggingface.co/JingzeShi/Doge-20M-checkpoint)**: 8e-3
20
+ - **[Doge-60M](https://huggingface.co/JingzeShi/Doge-60M-checkpoint)**: 6e-3
21
  - **Doge-160M**: 4e-3
22
+ - **Doge-320M**: 2e-3
23
+
24
+ | Model | Learning Rate | Schedule | Warmup Steps | Stable Steps |
25
+ |-------|---------------|----------|--------------|--------------|
26
+ | Doge-20M | 8e-3 | wsd_scheduler | 800 | 6400 |
27
+ | Doge-60M | 6e-3 | wsd_scheduler | 1600 | 12800 |
28
+ | Doge-160M | 4e-3 | wsd_scheduler | 2400 | 19200 |
29
+ | Doge-320M | 2e-3 | wsd_scheduler | 3200 | 25600 |