HuggingFaceH4/ultrafeedback_binarized
Viewer • Updated • 187k • 14.8k • 334
How to use DUAL-GPO/phi-2-dpo-test-iter-0 with PEFT:
from peft import PeftModel
from transformers import AutoModelForCausalLM
base_model = AutoModelForCausalLM.from_pretrained("microsoft/phi-2")
model = PeftModel.from_pretrained(base_model, "DUAL-GPO/phi-2-dpo-test-iter-0")This model is a fine-tuned version of lole25/phi-2-sft-ultrachat-lora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:
More information needed
More information needed
More information needed
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.0001 | 0.32 | 100 | 0.0002 | -0.0012 | -0.0015 | 0.5200 | 0.0003 | -233.6874 | -256.7341 | 0.8840 | 0.8263 |
| 0.0001 | 0.64 | 200 | 0.0002 | -0.0021 | -0.0023 | 0.5005 | 0.0002 | -233.7691 | -256.8278 | 0.8778 | 0.8201 |
| 0.0001 | 0.96 | 300 | 0.0002 | -0.0021 | -0.0024 | 0.4985 | 0.0003 | -233.7780 | -256.8272 | 0.8783 | 0.8206 |
| 0.0001 | 1.28 | 400 | 0.0002 | -0.0026 | -0.0029 | 0.5195 | 0.0003 | -233.8277 | -256.8757 | 0.8769 | 0.8192 |
| 0.0001 | 1.6 | 500 | 0.0002 | -0.0027 | -0.0030 | 0.5170 | 0.0003 | -233.8388 | -256.8869 | 0.8729 | 0.8151 |
| 0.0001 | 1.92 | 600 | 0.0002 | -0.0027 | -0.0030 | 0.5070 | 0.0003 | -233.8414 | -256.8860 | 0.8757 | 0.8180 |
| 0.0001 | 2.24 | 700 | 0.0002 | -0.0030 | -0.0032 | 0.5065 | 0.0002 | -233.8592 | -256.9123 | 0.8719 | 0.8142 |
| 0.0001 | 2.56 | 800 | 0.0002 | -0.0028 | -0.0030 | 0.5190 | 0.0003 | -233.8422 | -256.8898 | 0.8713 | 0.8135 |
| 0.0001 | 2.88 | 900 | 0.0002 | -0.0030 | -0.0031 | 0.5015 | 0.0002 | -233.8529 | -256.9111 | 0.8714 | 0.8136 |
| 0.0001 | 3.2 | 1000 | 0.0002 | -0.0029 | -0.0033 | 0.5180 | 0.0004 | -233.8666 | -256.9036 | 0.8733 | 0.8156 |
| 0.0001 | 3.52 | 1100 | 0.0002 | -0.0029 | -0.0034 | 0.5265 | 0.0005 | -233.8779 | -256.9080 | 0.8724 | 0.8145 |
| 0.0001 | 3.84 | 1200 | 0.0002 | -0.0031 | -0.0033 | 0.5045 | 0.0003 | -233.8733 | -256.9227 | 0.8705 | 0.8127 |
Base model
microsoft/phi-2