Llama-3.1-8B-Instruct-KTO-700

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the bct_non_cot_kto_700 dataset. It achieves the following results on the evaluation set:

  • Loss: 0.2141
  • Rewards/chosen: -0.1237
  • Logps/chosen: -17.0960
  • Logits/chosen: -4337002.6667
  • Rewards/rejected: -6.8517
  • Logps/rejected: -87.8667
  • Logits/rejected: -6325528.9351
  • Rewards/margins: 6.7280
  • Kl: 0.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 10.0

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Logps/chosen Logits/chosen Rewards/rejected Logps/rejected Logits/rejected Rewards/margins
0.4998 0.6349 50 0.4994 0.0114 -15.7443 -6144470.3492 0.0050 -19.2993 -7140612.1558 0.0064 4.0692
0.4897 1.2730 100 0.4845 0.1738 -14.1208 -6027973.0794 0.0297 -19.0521 -7100376.1039 0.1440 8.7304
0.3937 1.9079 150 0.3745 0.3253 -12.6062 -5056723.3016 -0.7501 -26.8503 -6579766.0260 1.0753 0.8562
0.2821 2.5460 200 0.2748 0.3148 -12.7111 -4519878.6032 -2.1742 -41.0916 -6449476.5714 2.4890 0.0
0.2081 3.1841 250 0.2370 0.3186 -12.6723 -4612124.9524 -3.4610 -53.9597 -6554266.5974 3.7797 0.0
0.2651 3.8190 300 0.2267 0.0307 -15.5522 -4312815.7460 -5.2909 -72.2581 -6286123.2208 5.3215 0.0
0.194 4.4571 350 0.2218 0.0573 -15.2853 -4256054.3492 -5.4598 -73.9474 -6266110.3377 5.5171 0.0
0.168 5.0952 400 0.2218 -0.0280 -16.1385 -4093047.3651 -6.0281 -79.6308 -6134765.7143 6.0002 0.0
0.2268 5.7302 450 0.2163 -0.1488 -17.3463 -4156519.1111 -6.7700 -87.0499 -6205292.8831 6.6213 0.0
0.1915 6.3683 500 0.2194 -0.0833 -16.6922 -4369097.6508 -6.4897 -84.2469 -6383271.0649 6.4064 0.0
0.201 7.0063 550 0.2189 -0.1198 -17.0568 -4304407.3651 -6.7293 -86.6421 -6343543.6883 6.6095 0.0
0.1961 7.6413 600 0.2157 -0.1073 -16.9320 -4324965.0794 -6.7349 -86.6987 -6328156.6753 6.6276 0.0
0.1721 8.2794 650 0.2157 -0.1282 -17.1405 -4329249.5238 -6.8581 -87.9301 -6320159.5844 6.7299 0.0
0.1879 8.9143 700 0.2141 -0.1237 -17.0960 -4337002.6667 -6.8517 -87.8667 -6325528.9351 6.7280 0.0
0.2335 9.5524 750 0.2158 -0.1282 -17.1407 -4329628.4444 -6.8618 -87.9676 -6335641.3506 6.7336 0.0

Framework versions

  • PEFT 0.12.0
  • Transformers 4.46.1
  • Pytorch 2.5.1+cu124
  • Datasets 3.1.0
  • Tokenizers 0.20.3
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for chchen/Llama-3.1-8B-Instruct-KTO-700

Adapter
(1509)
this model