Llama-3.1-8B-Instruct-KTO-700
This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the bct_non_cot_kto_700 dataset. It achieves the following results on the evaluation set:
- Loss: 0.2141
- Rewards/chosen: -0.1237
- Logps/chosen: -17.0960
- Logits/chosen: -4337002.6667
- Rewards/rejected: -6.8517
- Logps/rejected: -87.8667
- Logits/rejected: -6325528.9351
- Rewards/margins: 6.7280
- Kl: 0.0
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 16
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10.0
Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Logps/chosen | Logits/chosen | Rewards/rejected | Logps/rejected | Logits/rejected | Rewards/margins | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.4998 | 0.6349 | 50 | 0.4994 | 0.0114 | -15.7443 | -6144470.3492 | 0.0050 | -19.2993 | -7140612.1558 | 0.0064 | 4.0692 |
| 0.4897 | 1.2730 | 100 | 0.4845 | 0.1738 | -14.1208 | -6027973.0794 | 0.0297 | -19.0521 | -7100376.1039 | 0.1440 | 8.7304 |
| 0.3937 | 1.9079 | 150 | 0.3745 | 0.3253 | -12.6062 | -5056723.3016 | -0.7501 | -26.8503 | -6579766.0260 | 1.0753 | 0.8562 |
| 0.2821 | 2.5460 | 200 | 0.2748 | 0.3148 | -12.7111 | -4519878.6032 | -2.1742 | -41.0916 | -6449476.5714 | 2.4890 | 0.0 |
| 0.2081 | 3.1841 | 250 | 0.2370 | 0.3186 | -12.6723 | -4612124.9524 | -3.4610 | -53.9597 | -6554266.5974 | 3.7797 | 0.0 |
| 0.2651 | 3.8190 | 300 | 0.2267 | 0.0307 | -15.5522 | -4312815.7460 | -5.2909 | -72.2581 | -6286123.2208 | 5.3215 | 0.0 |
| 0.194 | 4.4571 | 350 | 0.2218 | 0.0573 | -15.2853 | -4256054.3492 | -5.4598 | -73.9474 | -6266110.3377 | 5.5171 | 0.0 |
| 0.168 | 5.0952 | 400 | 0.2218 | -0.0280 | -16.1385 | -4093047.3651 | -6.0281 | -79.6308 | -6134765.7143 | 6.0002 | 0.0 |
| 0.2268 | 5.7302 | 450 | 0.2163 | -0.1488 | -17.3463 | -4156519.1111 | -6.7700 | -87.0499 | -6205292.8831 | 6.6213 | 0.0 |
| 0.1915 | 6.3683 | 500 | 0.2194 | -0.0833 | -16.6922 | -4369097.6508 | -6.4897 | -84.2469 | -6383271.0649 | 6.4064 | 0.0 |
| 0.201 | 7.0063 | 550 | 0.2189 | -0.1198 | -17.0568 | -4304407.3651 | -6.7293 | -86.6421 | -6343543.6883 | 6.6095 | 0.0 |
| 0.1961 | 7.6413 | 600 | 0.2157 | -0.1073 | -16.9320 | -4324965.0794 | -6.7349 | -86.6987 | -6328156.6753 | 6.6276 | 0.0 |
| 0.1721 | 8.2794 | 650 | 0.2157 | -0.1282 | -17.1405 | -4329249.5238 | -6.8581 | -87.9301 | -6320159.5844 | 6.7299 | 0.0 |
| 0.1879 | 8.9143 | 700 | 0.2141 | -0.1237 | -17.0960 | -4337002.6667 | -6.8517 | -87.8667 | -6325528.9351 | 6.7280 | 0.0 |
| 0.2335 | 9.5524 | 750 | 0.2158 | -0.1282 | -17.1407 | -4329628.4444 | -6.8618 | -87.9676 | -6335641.3506 | 6.7336 | 0.0 |
Framework versions
- PEFT 0.12.0
- Transformers 4.46.1
- Pytorch 2.5.1+cu124
- Datasets 3.1.0
- Tokenizers 0.20.3
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for chchen/Llama-3.1-8B-Instruct-KTO-700
Base model
meta-llama/Llama-3.1-8B
Finetuned
meta-llama/Llama-3.1-8B-Instruct