Llama-3.1-8B-Instruct-KTO-500
This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the bct_non_cot_kto_500 dataset. It achieves the following results on the evaluation set:
- Loss: 0.2733
- Rewards/chosen: -0.7733
- Logps/chosen: -22.9961
- Logits/chosen: -5270992.0
- Rewards/rejected: -5.5785
- Logps/rejected: -75.2760
- Logits/rejected: -6459799.04
- Rewards/margins: 4.8052
- Kl: 0.0
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 16
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10.0
Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Logps/chosen | Logits/chosen | Rewards/rejected | Logps/rejected | Logits/rejected | Rewards/margins | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.499 | 0.8889 | 50 | 0.4996 | 0.0154 | -15.1082 | -6797012.48 | 0.0162 | -19.3286 | -7105245.44 | -0.0008 | 3.8204 |
| 0.4715 | 1.7778 | 100 | 0.4705 | 0.2738 | -12.5245 | -6498615.68 | 0.0353 | -19.1378 | -7022910.72 | 0.2385 | 12.4696 |
| 0.3459 | 2.6667 | 150 | 0.3755 | 0.0903 | -14.3596 | -5608189.44 | -1.1001 | -30.4920 | -6636171.52 | 1.1905 | 0.7521 |
| 0.2879 | 3.5556 | 200 | 0.3254 | 0.0251 | -15.0117 | -5304341.76 | -2.3085 | -42.5760 | -6579046.4 | 2.3336 | 0.0 |
| 0.2319 | 4.4444 | 250 | 0.3015 | -0.1526 | -16.7888 | -5472120.32 | -3.5465 | -54.9555 | -6862511.36 | 3.3939 | 3.7907 |
| 0.1971 | 5.3333 | 300 | 0.2927 | -0.4757 | -20.0192 | -5352769.92 | -4.5843 | -65.3341 | -6700284.16 | 4.1087 | 0.0 |
| 0.1825 | 6.2222 | 350 | 0.2855 | -0.6676 | -21.9389 | -5317811.2 | -5.1406 | -70.8970 | -6587809.28 | 4.4730 | 0.0 |
| 0.1996 | 7.1111 | 400 | 0.2804 | -0.6851 | -22.1139 | -5272716.48 | -5.2919 | -72.4094 | -6501901.44 | 4.6068 | 0.0 |
| 0.1776 | 8.0 | 450 | 0.2753 | -0.7492 | -22.7551 | -5282150.4 | -5.4837 | -74.3277 | -6475059.2 | 4.7345 | 0.0 |
| 0.208 | 8.8889 | 500 | 0.2733 | -0.7733 | -22.9961 | -5270992.0 | -5.5785 | -75.2760 | -6459799.04 | 4.8052 | 0.0 |
| 0.2075 | 9.7778 | 550 | 0.2766 | -0.7819 | -23.0821 | -5283820.8 | -5.5377 | -74.8675 | -6485849.6 | 4.7557 | 0.0 |
Framework versions
- PEFT 0.12.0
- Transformers 4.46.1
- Pytorch 2.5.1+cu124
- Datasets 3.1.0
- Tokenizers 0.20.3
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for chchen/Llama-3.1-8B-Instruct-KTO-500
Base model
meta-llama/Llama-3.1-8B
Finetuned
meta-llama/Llama-3.1-8B-Instruct