allenai
/

OLMo-7B-Instruct

Text Generation

Model card Files Files and versions

natolambert commited on Feb 27, 2024

Commit

53f06b7

·

verified ·

1 Parent(s): d87f426

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -150,6 +150,7 @@ Certainly! Here's the table with SFT and DPO as rows:
 | **SFT**                 | 2 × 10^-6     | N/A  | 3      | Linear warmup for the first 3% of total training time, then cooldown to 0 | 0            | 0                 | 2048                    |
 | **DPO**                 | 5 × 10^-7     | 0.1  | 3      | Linear warmup for the first 10% of total training time, then cooldown to 0| 0            | 0                 | 2048                    |
 ## Bias, Risks, and Limitations

 | **SFT**                 | 2 × 10^-6     | N/A  | 3      | Linear warmup for the first 3% of total training time, then cooldown to 0 | 0            | 0                 | 2048                    |
 | **DPO**                 | 5 × 10^-7     | 0.1  | 3      | Linear warmup for the first 10% of total training time, then cooldown to 0| 0            | 0                 | 2048                    |
+Compared to Tulu 2, DPO hyperparameters are the same. SFT is lower LR and 3 epochs instead of 2 (and 2k length instead of 8k).
 ## Bias, Risks, and Limitations