llama3-8b-mypo3_sim-full-beta10.0-lr4e-7

This model is a fine-tuned version of princeton-nlp/Llama-3-Base-8B-SFT on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3724
  • Rewards/chosen: 0.0743
  • Rewards/rejected: -0.3491
  • Rewards/accuracies: 0.7659
  • Rewards/margins: 0.4234
  • Logps/rejected: -1.5243
  • Logps/chosen: -1.2637
  • Logits/rejected: -1.0913
  • Logits/chosen: -1.0660

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 4e-07
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Logits/chosen Logits/rejected Logps/chosen Logps/rejected Validation Loss Rewards/accuracies Rewards/chosen Rewards/margins Rewards/rejected
1.3853 0.0523 100 -1.0093 -1.0407 -1.2745 -1.4996 1.3812 0.6448 -0.0333 0.0685 -0.1018
1.4035 0.1047 200 -1.0131 -1.0427 -1.2751 -1.5113 1.3794 0.7083 -0.0391 0.1794 -0.2185
1.3833 0.1570 300 -1.0412 -1.0703 -1.2702 -1.5149 1.3844 0.7063 0.0092 0.2639 -0.2547
1.3934 0.2094 400 -1.0425 -1.0709 -1.2629 -1.5130 1.3933 0.7262 0.0824 0.3188 -0.2364
1.4106 0.2617 500 1.4109 0.1926 -0.1813 0.7202 0.3739 -1.5075 -1.2519 -1.0638 -1.0374
1.4054 0.3141 600 1.3984 0.0015 -0.3629 0.7361 0.3644 -1.5257 -1.2710 -1.0834 -1.0564
1.3595 0.3664 700 1.3980 -0.0178 -0.4030 0.7282 0.3853 -1.5297 -1.2729 -1.0677 -1.0416
1.4312 0.4187 800 1.3940 0.0198 -0.3635 0.7321 0.3833 -1.5258 -1.2692 -1.0938 -1.0670
1.3978 0.4711 900 1.3915 0.0711 -0.3160 0.7440 0.3871 -1.5210 -1.2640 -1.0902 -1.0633
1.3815 0.5234 1000 1.3852 0.1178 -0.2864 0.7520 0.4042 -1.5180 -1.2594 -1.0909 -1.0657
1.378 0.5758 1100 1.3877 0.1649 -0.2299 0.7440 0.3947 -1.5124 -1.2547 -1.0936 -1.0682
1.3868 0.6281 1200 1.3771 0.0725 -0.3378 0.7480 0.4103 -1.5232 -1.2639 -1.0760 -1.0512
1.3653 0.6805 1300 1.3791 0.0379 -0.3772 0.7460 0.4152 -1.5271 -1.2674 -1.0568 -1.0335
1.3524 0.7328 1400 1.3824 0.1389 -0.2862 0.7440 0.4250 -1.5180 -1.2573 -1.0780 -1.0533
1.3716 0.7851 1500 1.3744 0.0827 -0.3438 0.7480 0.4265 -1.5238 -1.2629 -1.0928 -1.0670
1.3846 0.8375 1600 1.3734 0.0947 -0.3299 0.7520 0.4246 -1.5224 -1.2617 -1.0883 -1.0631
1.3631 0.8898 1700 1.3721 0.0610 -0.3667 0.7619 0.4277 -1.5261 -1.2650 -1.0636 -1.0404
1.3646 0.9422 1800 1.3719 0.0800 -0.3489 0.7639 0.4289 -1.5243 -1.2631 -1.0865 -1.0616
1.3606 0.9945 1900 1.3722 0.0718 -0.3524 0.7560 0.4241 -1.5246 -1.2640 -1.0887 -1.0636

Framework versions

  • Transformers 4.43.1
  • Pytorch 2.1.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.19.1
Downloads last month
5
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for aaaalongaa/llama3-8b-mypo3_sim-full-beta10.0-lr4e-7

Finetuned
(38)
this model

Dataset used to train aaaalongaa/llama3-8b-mypo3_sim-full-beta10.0-lr4e-7

Evaluation results