Llama-3.1-8B-Instruct-KTO-900

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the bct_non_cot_kto_900 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2336
  • Rewards/chosen: -0.7387
  • Logps/chosen: -23.9743
  • Logits/chosen: -3936346.7253
  • Rewards/rejected: -8.0820
  • Logps/rejected: -101.9145
  • Logits/rejected: -5595913.7079
  • Rewards/margins: 7.3434
  • Kl: 0.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Logps/chosen Logits/chosen Rewards/rejected Logps/rejected Logits/rejected Rewards/margins
0.4998 0.4938 50 0.4998 0.0071 -16.5171 -6513294.0659 0.0060 -21.0342 -8309430.6517 0.0011 7.3805
0.4916 0.9877 100 0.4882 0.0450 -16.1375 -6392945.9341 -0.0510 -21.6039 -8292936.6292 0.0960 1.8784
0.388 1.4815 150 0.3815 0.1292 -15.2958 -5358742.5055 -0.9531 -30.6250 -7796024.0899 1.0823 3.1272
0.3027 1.9753 200 0.2895 0.2877 -13.7103 -4776840.7912 -2.2363 -43.4567 -7301079.7303 2.5240 0.0
0.2489 2.4691 250 0.2521 0.2507 -14.0810 -4633505.4066 -3.7515 -58.6088 -6882766.3820 4.0022 0.0
0.2086 2.9630 300 0.2438 0.0154 -16.4339 -4421153.7582 -5.2262 -73.3562 -6361427.4157 5.2416 0.0
0.1899 3.4568 350 0.2428 -0.2031 -18.6187 -4157514.5495 -5.8225 -79.3186 -5951174.4719 5.6194 0.0
0.25 3.9506 400 0.2462 -0.2869 -19.4569 -4007848.4396 -6.0265 -81.3591 -5686904.8090 5.7396 0.0
0.2039 4.4444 450 0.2406 -0.3700 -20.2878 -3949201.2308 -6.5107 -86.2007 -5679643.3258 6.1407 0.0
0.1757 4.9383 500 0.2408 -0.5367 -21.9549 -3883009.7582 -7.2303 -93.3975 -5592792.4494 6.6936 0.0
0.1745 5.4321 550 0.2381 -0.4705 -21.2929 -3896917.0989 -6.9828 -90.9218 -5625497.8876 6.5123 0.0
0.1966 5.9259 600 0.2399 -0.6421 -23.0091 -3685890.8132 -7.6576 -97.6701 -5304065.7978 7.0155 0.0
0.1717 6.4198 650 0.2365 -0.6360 -22.9479 -3976765.8901 -7.6226 -97.3204 -5631893.5730 6.9866 0.0
0.1746 6.9136 700 0.2336 -0.7387 -23.9743 -3936346.7253 -8.0820 -101.9145 -5595913.7079 7.3434 0.0
0.1586 7.4074 750 0.2342 -0.7033 -23.6211 -3965146.7253 -7.9157 -100.2512 -5631508.8539 7.2124 0.0
0.1651 7.9012 800 0.2346 -0.7270 -23.8581 -3923032.9670 -8.0741 -101.8346 -5564442.2472 7.3470 0.0
0.1761 8.3951 850 0.2382 -0.8088 -24.6761 -3866978.8132 -8.2918 -104.0118 -5499030.6517 7.4829 0.0
0.1829 8.8889 900 0.2382 -0.8048 -24.6361 -3876442.3736 -8.3157 -104.2511 -5500599.3708 7.5109 0.0
0.1621 9.3827 950 0.2386 -0.8111 -24.6987 -3862863.1209 -8.3110 -104.2041 -5483545.5281 7.4999 0.0
0.1824 9.8765 1000 0.2388 -0.8165 -24.7532 -3883063.5604 -8.3197 -104.2909 -5463227.6854 7.5031 0.0

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for chchen/Llama-3.1-8B-Instruct-KTO-900

Adapter
(1509)
this model