Llama-3.1-8B-Instruct-KTO-500

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the bct_non_cot_kto_500 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2733
  • Rewards/chosen: -0.7733
  • Logps/chosen: -22.9961
  • Logits/chosen: -5270992.0
  • Rewards/rejected: -5.5785
  • Logps/rejected: -75.2760
  • Logits/rejected: -6459799.04
  • Rewards/margins: 4.8052
  • Kl: 0.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Logps/chosen Logits/chosen Rewards/rejected Logps/rejected Logits/rejected Rewards/margins
0.499 0.8889 50 0.4996 0.0154 -15.1082 -6797012.48 0.0162 -19.3286 -7105245.44 -0.0008 3.8204
0.4715 1.7778 100 0.4705 0.2738 -12.5245 -6498615.68 0.0353 -19.1378 -7022910.72 0.2385 12.4696
0.3459 2.6667 150 0.3755 0.0903 -14.3596 -5608189.44 -1.1001 -30.4920 -6636171.52 1.1905 0.7521
0.2879 3.5556 200 0.3254 0.0251 -15.0117 -5304341.76 -2.3085 -42.5760 -6579046.4 2.3336 0.0
0.2319 4.4444 250 0.3015 -0.1526 -16.7888 -5472120.32 -3.5465 -54.9555 -6862511.36 3.3939 3.7907
0.1971 5.3333 300 0.2927 -0.4757 -20.0192 -5352769.92 -4.5843 -65.3341 -6700284.16 4.1087 0.0
0.1825 6.2222 350 0.2855 -0.6676 -21.9389 -5317811.2 -5.1406 -70.8970 -6587809.28 4.4730 0.0
0.1996 7.1111 400 0.2804 -0.6851 -22.1139 -5272716.48 -5.2919 -72.4094 -6501901.44 4.6068 0.0
0.1776 8.0 450 0.2753 -0.7492 -22.7551 -5282150.4 -5.4837 -74.3277 -6475059.2 4.7345 0.0
0.208 8.8889 500 0.2733 -0.7733 -22.9961 -5270992.0 -5.5785 -75.2760 -6459799.04 4.8052 0.0
0.2075 9.7778 550 0.2766 -0.7819 -23.0821 -5283820.8 -5.5377 -74.8675 -6485849.6 4.7557 0.0

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for chchen/Llama-3.1-8B-Instruct-KTO-500

Adapter
(1509)
this model