Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -155,7 +155,7 @@ trainer.train()
 ### Loss Function Consideration
-Anthropic recommends using the loss function \( LPM = \log(1 + e^{\text{{rbad}} - \text{{rgood}}}) \) for preference models. However, this PreferED model was trained using binary cross-entropy loss, and therefore changing the loss functions might increase the training time to converge. For more details on preference models and loss functions, you may refer to the paper by Askell et al., 2021: [A General Language Assistant as a Laboratory for Alignment](https://arxiv.org/abs/2112.00861).


155
156	### Loss Function Consideration
157
158	+ Anthropic recommends using the loss function L<sub>PM</sub> = log(1 + e^(r<sub>bad</sub> - r<sub>good</sub>)) for preference models. However, this PreferED model was trained using binary cross-entropy loss, and therefore changing the loss functions might increase the training time to converge. For more details on preference models and loss functions, you may refer to the paper by Askell et al., 2021: [A General Language Assistant as a Laboratory for Alignment](https://arxiv.org/abs/2112.00861).
159
160
161