model_usp4_dpo5

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 2.0392
  • Rewards/chosen: -5.3723
  • Rewards/rejected: -8.4336
  • Rewards/accuracies: 0.6400
  • Rewards/margins: 3.0613
  • Logps/rejected: -127.9703
  • Logps/chosen: -121.9222
  • Logits/rejected: -0.8515
  • Logits/chosen: -0.7928

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.0857 2.67 100 1.4158 -6.1266 -7.7828 0.5700 1.6562 -126.6688 -123.4309 -0.2615 -0.2304
0.0373 5.33 200 2.0473 -10.8748 -14.0013 0.6400 3.1265 -139.1058 -132.9272 -1.1231 -1.1327
0.0061 8.0 300 2.3674 -11.1453 -14.1832 0.5900 3.0378 -139.4695 -133.4684 -0.8038 -0.7431
0.0004 10.67 400 2.0235 -4.6284 -7.5396 0.6500 2.9112 -126.1823 -120.4344 -0.8446 -0.7851
0.0 13.33 500 2.0425 -5.3605 -8.3967 0.6400 3.0362 -127.8966 -121.8987 -0.8512 -0.7922
0.0 16.0 600 2.0426 -5.3772 -8.4171 0.6400 3.0399 -127.9373 -121.9320 -0.8517 -0.7927
0.0 18.67 700 2.0478 -5.3866 -8.4190 0.6400 3.0323 -127.9411 -121.9509 -0.8520 -0.7932
0.0 21.33 800 2.0499 -5.3884 -8.4250 0.6400 3.0366 -127.9531 -121.9544 -0.8517 -0.7929
0.0 24.0 900 2.0375 -5.3727 -8.4358 0.6400 3.0631 -127.9748 -121.9230 -0.8519 -0.7930
0.0 26.67 1000 2.0392 -5.3723 -8.4336 0.6400 3.0613 -127.9703 -121.9222 -0.8515 -0.7928

Framework versions

  • PEFT 0.10.0
  • Transformers 4.39.3
  • Pytorch 2.2.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
5
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for guoyu-zhang/model_usp4_dpo5

Adapter
(1208)
this model