Llama-3.1-8B-Instruct-KTO-800

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the bct_non_cot_kto_800 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2268
  • Rewards/chosen: -0.2424
  • Logps/chosen: -18.9583
  • Logits/chosen: -5380960.3636
  • Rewards/rejected: -7.7435
  • Logps/rejected: -97.2023
  • Logits/rejected: -6542436.0
  • Rewards/margins: 7.5011
  • Kl: 0.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Logps/chosen Logits/chosen Rewards/rejected Logps/rejected Logits/rejected Rewards/margins
0.4996 0.5556 50 0.5001 0.0018 -16.5156 -7144997.8182 0.0023 -19.7443 -7765536.0 -0.0004 3.2831
0.4854 1.1111 100 0.4841 0.1104 -15.4301 -7037787.6364 -0.0068 -19.8353 -7746840.8889 0.1172 5.6073
0.3877 1.6667 150 0.3919 0.0166 -16.3681 -6113848.7273 -1.1030 -30.7975 -7278912.4444 1.1196 1.6923
0.2713 2.2222 200 0.2989 0.2683 -13.8509 -5712961.0909 -2.3006 -42.7735 -6877109.3333 2.5690 1.5282
0.2298 2.7778 250 0.2562 0.2113 -14.4207 -5691608.7273 -4.0499 -60.2658 -6756057.7778 4.2612 1.6214
0.2023 3.3333 300 0.2438 0.0519 -16.0149 -5509416.3636 -5.4444 -74.2112 -6538183.1111 5.4963 2.0586
0.2091 3.8889 350 0.2401 -0.0056 -16.5904 -5302801.0909 -6.0992 -80.7588 -6321514.2222 6.0935 1.2333
0.1803 4.4444 400 0.2313 -0.0036 -16.5705 -5251804.7273 -6.5164 -84.9310 -6353763.5556 6.5127 0.9655
0.1882 5.0 450 0.2316 -0.0895 -17.4291 -5285734.9091 -6.9674 -89.4410 -6387964.4444 6.8779 0.8871
0.2097 5.5556 500 0.2321 -0.0880 -17.4141 -5317437.0909 -6.9551 -89.3176 -6466442.2222 6.8671 0.8415
0.2101 6.1111 550 0.2369 -0.2595 -19.1287 -5358458.1818 -7.4899 -94.6661 -6556113.7778 7.2304 0.6120
0.2205 6.6667 600 0.2306 -0.0927 -17.4612 -5311256.0 -7.1726 -91.4927 -6522494.2222 7.0798 2.9167
0.2015 7.2222 650 0.2278 -0.2235 -18.7694 -5318941.0909 -7.7564 -97.3308 -6520522.6667 7.5328 3.1473
0.1847 7.7778 700 0.2302 -0.2017 -18.5512 -5325900.7273 -7.6276 -96.0427 -6506082.2222 7.4258 0.0
0.1755 8.3333 750 0.2296 -0.2041 -18.5748 -5375173.4545 -7.6845 -96.6120 -6566019.5556 7.4804 0.0
0.1484 8.8889 800 0.2270 -0.2105 -18.6391 -5378953.8182 -7.7307 -97.0744 -6566795.5556 7.5202 0.0
0.2069 9.4444 850 0.2268 -0.2424 -18.9583 -5380960.3636 -7.7435 -97.2023 -6542436.0 7.5011 0.0
0.1825 10.0 900 0.2275 -0.2352 -18.8865 -5396778.1818 -7.7588 -97.3554 -6567349.3333 7.5236 0.0

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for chchen/Llama-3.1-8B-Instruct-KTO-800

Adapter
(1509)
this model