Llama-3.1-8B-Instruct-KTO-900
This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the bct_non_cot_kto_900 dataset. It achieves the following results on the evaluation set:
- Loss: 0.2336
- Rewards/chosen: -0.7387
- Logps/chosen: -23.9743
- Logits/chosen: -3936346.7253
- Rewards/rejected: -8.0820
- Logps/rejected: -101.9145
- Logits/rejected: -5595913.7079
- Rewards/margins: 7.3434
- Kl: 0.0
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 16
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10.0
Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Logps/chosen | Logits/chosen | Rewards/rejected | Logps/rejected | Logits/rejected | Rewards/margins | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.4998 | 0.4938 | 50 | 0.4998 | 0.0071 | -16.5171 | -6513294.0659 | 0.0060 | -21.0342 | -8309430.6517 | 0.0011 | 7.3805 |
| 0.4916 | 0.9877 | 100 | 0.4882 | 0.0450 | -16.1375 | -6392945.9341 | -0.0510 | -21.6039 | -8292936.6292 | 0.0960 | 1.8784 |
| 0.388 | 1.4815 | 150 | 0.3815 | 0.1292 | -15.2958 | -5358742.5055 | -0.9531 | -30.6250 | -7796024.0899 | 1.0823 | 3.1272 |
| 0.3027 | 1.9753 | 200 | 0.2895 | 0.2877 | -13.7103 | -4776840.7912 | -2.2363 | -43.4567 | -7301079.7303 | 2.5240 | 0.0 |
| 0.2489 | 2.4691 | 250 | 0.2521 | 0.2507 | -14.0810 | -4633505.4066 | -3.7515 | -58.6088 | -6882766.3820 | 4.0022 | 0.0 |
| 0.2086 | 2.9630 | 300 | 0.2438 | 0.0154 | -16.4339 | -4421153.7582 | -5.2262 | -73.3562 | -6361427.4157 | 5.2416 | 0.0 |
| 0.1899 | 3.4568 | 350 | 0.2428 | -0.2031 | -18.6187 | -4157514.5495 | -5.8225 | -79.3186 | -5951174.4719 | 5.6194 | 0.0 |
| 0.25 | 3.9506 | 400 | 0.2462 | -0.2869 | -19.4569 | -4007848.4396 | -6.0265 | -81.3591 | -5686904.8090 | 5.7396 | 0.0 |
| 0.2039 | 4.4444 | 450 | 0.2406 | -0.3700 | -20.2878 | -3949201.2308 | -6.5107 | -86.2007 | -5679643.3258 | 6.1407 | 0.0 |
| 0.1757 | 4.9383 | 500 | 0.2408 | -0.5367 | -21.9549 | -3883009.7582 | -7.2303 | -93.3975 | -5592792.4494 | 6.6936 | 0.0 |
| 0.1745 | 5.4321 | 550 | 0.2381 | -0.4705 | -21.2929 | -3896917.0989 | -6.9828 | -90.9218 | -5625497.8876 | 6.5123 | 0.0 |
| 0.1966 | 5.9259 | 600 | 0.2399 | -0.6421 | -23.0091 | -3685890.8132 | -7.6576 | -97.6701 | -5304065.7978 | 7.0155 | 0.0 |
| 0.1717 | 6.4198 | 650 | 0.2365 | -0.6360 | -22.9479 | -3976765.8901 | -7.6226 | -97.3204 | -5631893.5730 | 6.9866 | 0.0 |
| 0.1746 | 6.9136 | 700 | 0.2336 | -0.7387 | -23.9743 | -3936346.7253 | -8.0820 | -101.9145 | -5595913.7079 | 7.3434 | 0.0 |
| 0.1586 | 7.4074 | 750 | 0.2342 | -0.7033 | -23.6211 | -3965146.7253 | -7.9157 | -100.2512 | -5631508.8539 | 7.2124 | 0.0 |
| 0.1651 | 7.9012 | 800 | 0.2346 | -0.7270 | -23.8581 | -3923032.9670 | -8.0741 | -101.8346 | -5564442.2472 | 7.3470 | 0.0 |
| 0.1761 | 8.3951 | 850 | 0.2382 | -0.8088 | -24.6761 | -3866978.8132 | -8.2918 | -104.0118 | -5499030.6517 | 7.4829 | 0.0 |
| 0.1829 | 8.8889 | 900 | 0.2382 | -0.8048 | -24.6361 | -3876442.3736 | -8.3157 | -104.2511 | -5500599.3708 | 7.5109 | 0.0 |
| 0.1621 | 9.3827 | 950 | 0.2386 | -0.8111 | -24.6987 | -3862863.1209 | -8.3110 | -104.2041 | -5483545.5281 | 7.4999 | 0.0 |
| 0.1824 | 9.8765 | 1000 | 0.2388 | -0.8165 | -24.7532 | -3883063.5604 | -8.3197 | -104.2909 | -5463227.6854 | 7.5031 | 0.0 |
Framework versions
- PEFT 0.12.0
- Transformers 4.46.1
- Pytorch 2.5.1+cu124
- Datasets 3.1.0
- Tokenizers 0.20.3
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for chchen/Llama-3.1-8B-Instruct-KTO-900
Base model
meta-llama/Llama-3.1-8B
Finetuned
meta-llama/Llama-3.1-8B-Instruct