model_usp2_dpo9
This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 3.0422
- Rewards/chosen: -16.6521
- Rewards/rejected: -22.4980
- Rewards/accuracies: 0.6900
- Rewards/margins: 5.8460
- Logps/rejected: -134.2057
- Logps/chosen: -127.2480
- Logits/rejected: -0.3729
- Logits/chosen: -0.3027
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 4
- eval_batch_size: 1
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- training_steps: 1000
Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logits/rejected | Logits/chosen |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.1118 | 2.67 | 100 | 1.5049 | 0.5063 | -2.6419 | 0.6100 | 3.1482 | -112.1433 | -108.1832 | -0.0840 | -0.0663 |
| 0.0485 | 5.33 | 200 | 3.9547 | -16.7119 | -24.8420 | 0.6600 | 8.1301 | -136.8102 | -127.3146 | -0.4936 | -0.4252 |
| 0.0933 | 8.0 | 300 | 2.7780 | 6.4777 | 2.8373 | 0.6200 | 3.6404 | -106.0553 | -101.5483 | 0.0824 | 0.1237 |
| 0.0001 | 10.67 | 400 | 3.9997 | -24.6430 | -30.7416 | 0.6600 | 6.0986 | -143.3652 | -136.1268 | -0.5838 | -0.5341 |
| 0.0 | 13.33 | 500 | 3.0680 | -16.6486 | -22.4522 | 0.6900 | 5.8036 | -134.1547 | -127.2442 | -0.3725 | -0.3021 |
| 0.0 | 16.0 | 600 | 3.0371 | -16.6460 | -22.4827 | 0.6900 | 5.8367 | -134.1887 | -127.2413 | -0.3725 | -0.3022 |
| 0.0 | 18.67 | 700 | 3.0540 | -16.6424 | -22.4815 | 0.6900 | 5.8391 | -134.1873 | -127.2373 | -0.3728 | -0.3028 |
| 0.0 | 21.33 | 800 | 3.0298 | -16.6187 | -22.4938 | 0.6900 | 5.8750 | -134.2010 | -127.2110 | -0.3731 | -0.3028 |
| 0.0 | 24.0 | 900 | 3.0554 | -16.6241 | -22.4802 | 0.6900 | 5.8561 | -134.1858 | -127.2169 | -0.3725 | -0.3027 |
| 0.0 | 26.67 | 1000 | 3.0422 | -16.6521 | -22.4980 | 0.6900 | 5.8460 | -134.2057 | -127.2480 | -0.3729 | -0.3027 |
Framework versions
- PEFT 0.10.0
- Transformers 4.39.3
- Pytorch 2.2.2+cu121
- Datasets 2.18.0
- Tokenizers 0.15.2
- Downloads last month
- -
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support
Model tree for guoyu-zhang/model_usp2_dpo9
Base model
meta-llama/Llama-2-7b-chat-hf