Tandogan
/

MNLP_M2_dpo_model

instruction-tuning

preference-modeling

Model card Files Files and versions

Tandogan commited on May 27, 2025

Commit

ceb4140

·

verified ·

1 Parent(s): fea5737

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -59,11 +59,11 @@ Two DPO fine-tuning experiments were run:
 - **Monitoring**: Weights & Biases (WandB)
 - **Best Epoch Selection**: Based on validation loss
-## 📊 Intended Use
 This model is intended for research and experimentation with preference-based alignment and reward modeling. It is **not** production-ready and may produce hallucinated, biased, or unsafe outputs. Please evaluate carefully for downstream tasks.
-## 💾 How to Use
 You can use the model with the `transformers` and `trl` libraries for inference or evaluation:

 - **Monitoring**: Weights & Biases (WandB)
 - **Best Epoch Selection**: Based on validation loss
+## Intended Use
 This model is intended for research and experimentation with preference-based alignment and reward modeling. It is **not** production-ready and may produce hallucinated, biased, or unsafe outputs. Please evaluate carefully for downstream tasks.
+## How to Use
 You can use the model with the `transformers` and `trl` libraries for inference or evaluation: