Update README.md
Browse files
README.md
CHANGED
|
@@ -59,11 +59,11 @@ Two DPO fine-tuning experiments were run:
|
|
| 59 |
- **Monitoring**: Weights & Biases (WandB)
|
| 60 |
- **Best Epoch Selection**: Based on validation loss
|
| 61 |
|
| 62 |
-
##
|
| 63 |
|
| 64 |
This model is intended for research and experimentation with preference-based alignment and reward modeling. It is **not** production-ready and may produce hallucinated, biased, or unsafe outputs. Please evaluate carefully for downstream tasks.
|
| 65 |
|
| 66 |
-
##
|
| 67 |
|
| 68 |
You can use the model with the `transformers` and `trl` libraries for inference or evaluation:
|
| 69 |
|
|
|
|
| 59 |
- **Monitoring**: Weights & Biases (WandB)
|
| 60 |
- **Best Epoch Selection**: Based on validation loss
|
| 61 |
|
| 62 |
+
## Intended Use
|
| 63 |
|
| 64 |
This model is intended for research and experimentation with preference-based alignment and reward modeling. It is **not** production-ready and may produce hallucinated, biased, or unsafe outputs. Please evaluate carefully for downstream tasks.
|
| 65 |
|
| 66 |
+
## How to Use
|
| 67 |
|
| 68 |
You can use the model with the `transformers` and `trl` libraries for inference or evaluation:
|
| 69 |
|