Update README.md
Browse files
README.md
CHANGED
|
@@ -150,6 +150,7 @@ Certainly! Here's the table with SFT and DPO as rows:
|
|
| 150 |
| **SFT** | 2 × 10^-6 | N/A | 3 | Linear warmup for the first 3% of total training time, then cooldown to 0 | 0 | 0 | 2048 |
|
| 151 |
| **DPO** | 5 × 10^-7 | 0.1 | 3 | Linear warmup for the first 10% of total training time, then cooldown to 0| 0 | 0 | 2048 |
|
| 152 |
|
|
|
|
| 153 |
|
| 154 |
## Bias, Risks, and Limitations
|
| 155 |
|
|
|
|
| 150 |
| **SFT** | 2 × 10^-6 | N/A | 3 | Linear warmup for the first 3% of total training time, then cooldown to 0 | 0 | 0 | 2048 |
|
| 151 |
| **DPO** | 5 × 10^-7 | 0.1 | 3 | Linear warmup for the first 10% of total training time, then cooldown to 0| 0 | 0 | 2048 |
|
| 152 |
|
| 153 |
+
Compared to Tulu 2, DPO hyperparameters are the same. SFT is lower LR and 3 epochs instead of 2 (and 2k length instead of 8k).
|
| 154 |
|
| 155 |
## Bias, Risks, and Limitations
|
| 156 |
|