Tandogan commited on
Commit
f4d700e
·
verified ·
1 Parent(s): fe6a9f2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -28,7 +28,7 @@ DPO lets us directly train the model to score preferred responses higher than le
28
  - **Optimizer**: AdamW (learning rate = `2e-6`, weight decay = `0`)
29
  - **Precision**: bf16
30
  - **Batch size**: 2 (gradient accumulation = 4)
31
- - **Scheduler**: Linear with 7% warmup
32
  - **DPO Beta**: 0.1
33
  - **Eval & Checkpointing**: Every epoch
34
  - **Monitoring**: Weights & Biases (WandB)
 
28
  - **Optimizer**: AdamW (learning rate = `2e-6`, weight decay = `0`)
29
  - **Precision**: bf16
30
  - **Batch size**: 2 (gradient accumulation = 4)
31
+ - **Scheduler**: cosine with 1% warmup
32
  - **DPO Beta**: 0.1
33
  - **Eval & Checkpointing**: Every epoch
34
  - **Monitoring**: Weights & Biases (WandB)