samagra14wefi commited on
Commit
9696018
·
1 Parent(s): ad7a3fc

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -155,7 +155,7 @@ trainer.train()
155
 
156
  ### Loss Function Consideration
157
 
158
- Anthropic recommends using the loss function \( LPM = \log(1 + e^{\text{{rbad}} - \text{{rgood}}}) \) for preference models. However, this PreferED model was trained using binary cross-entropy loss, and therefore changing the loss functions might increase the training time to converge. For more details on preference models and loss functions, you may refer to the paper by Askell et al., 2021: [A General Language Assistant as a Laboratory for Alignment](https://arxiv.org/abs/2112.00861).
159
 
160
 
161
 
 
155
 
156
  ### Loss Function Consideration
157
 
158
+ Anthropic recommends using the loss function L<sub>PM</sub> = log(1 + e^(r<sub>bad</sub> - r<sub>good</sub>)) for preference models. However, this PreferED model was trained using binary cross-entropy loss, and therefore changing the loss functions might increase the training time to converge. For more details on preference models and loss functions, you may refer to the paper by Askell et al., 2021: [A General Language Assistant as a Laboratory for Alignment](https://arxiv.org/abs/2112.00861).
159
 
160
 
161