model_usp2_dpo1
This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 1.0445
- Rewards/chosen: -10.7928
- Rewards/rejected: -14.2959
- Rewards/accuracies: 0.7400
- Rewards/margins: 3.5031
- Logps/rejected: -250.8612
- Logps/chosen: -215.1325
- Logits/rejected: -0.8799
- Logits/chosen: -0.9353
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.0632 | 2.67 | 100 | 0.6858 | -5.5164 | -7.1903 | 0.6900 | 1.6739 | -179.8054 | -162.3683 | -0.8170 | -0.7196 |
| 0.002 | 5.33 | 200 | 0.7615 | -6.7178 | -9.5839 | 0.7600 | 2.8661 | -203.7411 | -174.3826 | -1.0768 | -1.0701 |
| 0.0001 | 8.0 | 300 | 1.0247 | -10.4976 | -13.8923 | 0.7400 | 3.3948 | -246.8256 | -212.1801 | -0.8995 | -0.9506 |
| 0.0001 | 10.67 | 400 | 1.0323 | -10.6255 | -14.0760 | 0.75 | 3.4505 | -248.6621 | -213.4589 | -0.8910 | -0.9437 |
| 0.0001 | 13.33 | 500 | 1.0328 | -10.7107 | -14.1992 | 0.7400 | 3.4885 | -249.8943 | -214.3115 | -0.8858 | -0.9397 |
| 0.0001 | 16.0 | 600 | 1.0378 | -10.7577 | -14.2607 | 0.7400 | 3.5030 | -250.5091 | -214.7812 | -0.8823 | -0.9372 |
| 0.0 | 18.67 | 700 | 1.0407 | -10.7811 | -14.2886 | 0.75 | 3.5075 | -250.7885 | -215.0155 | -0.8811 | -0.9363 |
| 0.0001 | 21.33 | 800 | 1.0415 | -10.7857 | -14.2997 | 0.7400 | 3.5139 | -250.8989 | -215.0617 | -0.8802 | -0.9359 |
| 0.0001 | 24.0 | 900 | 1.0423 | -10.7886 | -14.2954 | 0.7400 | 3.5068 | -250.8562 | -215.0906 | -0.8802 | -0.9356 |
| 0.0001 | 26.67 | 1000 | 1.0445 | -10.7928 | -14.2959 | 0.7400 | 3.5031 | -250.8612 | -215.1325 | -0.8799 | -0.9353 |
Framework versions
- PEFT 0.10.0
- Transformers 4.39.3
- Pytorch 2.2.2+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for guoyu-zhang/model_usp2_dpo1
Base model
meta-llama/Llama-2-7b-chat-hf