model_hh_usp3_400

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 3.1160
  • Rewards/chosen: -8.2855
  • Rewards/rejected: -15.5942
  • Rewards/accuracies: 0.6700
  • Rewards/margins: 7.3087
  • Logps/rejected: -130.3543
  • Logps/chosen: -121.6985
  • Logits/rejected: -0.6216
  • Logits/chosen: -0.5451

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.01 4.0 100 1.2916 -0.4582 -4.3086 0.6700 3.8504 -117.8148 -113.0015 -0.2184 -0.2363
0.0779 8.0 200 2.2220 -3.5887 -8.9487 0.6700 5.3600 -122.9704 -116.4798 -0.6463 -0.6426
0.0002 12.0 300 2.6768 -2.9215 -9.1033 0.6700 6.1818 -123.1422 -115.7384 -0.5538 -0.4825
0.0 16.0 400 3.0879 -8.2794 -15.6271 0.6700 7.3476 -130.3908 -121.6917 -0.6205 -0.5443
0.0 20.0 500 3.0933 -8.2829 -15.6299 0.6700 7.3470 -130.3939 -121.6956 -0.6209 -0.5444
0.0 24.0 600 3.0984 -8.2550 -15.6140 0.6800 7.3590 -130.3763 -121.6645 -0.6208 -0.5443
0.0 28.0 700 3.0852 -8.2794 -15.5895 0.6800 7.3102 -130.3491 -121.6916 -0.6204 -0.5440
0.0 32.0 800 3.0838 -8.2687 -15.6392 0.6700 7.3705 -130.4043 -121.6798 -0.6212 -0.5448
0.0 36.0 900 3.0836 -8.2681 -15.6105 0.6700 7.3424 -130.3724 -121.6791 -0.6211 -0.5444
0.0 40.0 1000 3.1160 -8.2855 -15.5942 0.6700 7.3087 -130.3543 -121.6985 -0.6216 -0.5451

Framework versions

  • PEFT 0.10.0
  • Transformers 4.39.3
  • Pytorch 2.2.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for guoyu-zhang/model_hh_usp3_400

Adapter
(1202)
this model