model_hh_usp2_400

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.1146
  • Rewards/chosen: -10.1910
  • Rewards/rejected: -12.8552
  • Rewards/accuracies: 0.5700
  • Rewards/margins: 2.6642
  • Logps/rejected: -130.0886
  • Logps/chosen: -125.3267
  • Logits/rejected: 0.1734
  • Logits/chosen: 0.1410

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.0066 4.0 100 2.4563 -5.8278 -7.6948 0.5900 1.8670 -124.3548 -120.4786 -0.0743 -0.0887
0.042 8.0 200 2.8011 -2.8779 -4.7588 0.5300 1.8808 -121.0926 -117.2011 0.3427 0.3247
0.0009 12.0 300 3.2063 -16.1959 -19.1144 0.5500 2.9186 -137.0433 -131.9988 0.1998 0.1756
0.0001 16.0 400 3.1047 -10.1343 -12.7872 0.5800 2.6529 -130.0131 -125.2637 0.1757 0.1437
0.0 20.0 500 3.1359 -10.1980 -12.8447 0.5800 2.6467 -130.0769 -125.3345 0.1736 0.1412
0.0 24.0 600 3.1186 -10.1842 -12.8467 0.5800 2.6625 -130.0792 -125.3191 0.1732 0.1409
0.0 28.0 700 3.1174 -10.2101 -12.8729 0.5900 2.6628 -130.1082 -125.3479 0.1733 0.1406
0.0 32.0 800 3.1257 -10.1973 -12.8683 0.5900 2.6711 -130.1032 -125.3336 0.1735 0.1409
0.0 36.0 900 3.1112 -10.1620 -12.8766 0.5800 2.7147 -130.1124 -125.2944 0.1735 0.1413
0.0 40.0 1000 3.1146 -10.1910 -12.8552 0.5700 2.6642 -130.0886 -125.3267 0.1734 0.1410

Framework versions

  • PEFT 0.10.0
  • Transformers 4.39.3
  • Pytorch 2.2.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for guoyu-zhang/model_hh_usp2_400

Adapter
(1199)
this model