EllieS/timedial_dpo
Viewer • Updated • 1.45k • 81
How to use EllieS/zephyr-dpo-timedial with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("alignment-handbook/zephyr-7b-sft-full")
model = PeftModel.from_pretrained(base_model, "EllieS/zephyr-dpo-timedial")This model is a fine-tuned version of EllieS/zephyr-sft-timedial on the EllieS/timedial_dpo dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.3995 | 0.35 | 100 | 0.3705 | 0.2856 | -0.5202 | 1.0 | 0.8058 | -97.0354 | -1.9368 | -2.7815 | -2.7803 |
| 0.2236 | 0.69 | 200 | 0.2236 | 0.2987 | -1.0958 | 1.0 | 1.3944 | -154.5925 | -0.6286 | -2.7419 | -2.7480 |
Base model
mistralai/Mistral-7B-v0.1