Llama-3.1-8B-Instruct-KTO-800
This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the bct_non_cot_kto_800 dataset. It achieves the following results on the evaluation set:
- Loss: 0.2268
- Rewards/chosen: -0.2424
- Logps/chosen: -18.9583
- Logits/chosen: -5380960.3636
- Rewards/rejected: -7.7435
- Logps/rejected: -97.2023
- Logits/rejected: -6542436.0
- Rewards/margins: 7.5011
- Kl: 0.0
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-06
- train_batch_size: 2
- eval_batch_size: 2
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 16
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 10.0
Training results
| Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Logps/chosen | Logits/chosen | Rewards/rejected | Logps/rejected | Logits/rejected | Rewards/margins | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0.4996 | 0.5556 | 50 | 0.5001 | 0.0018 | -16.5156 | -7144997.8182 | 0.0023 | -19.7443 | -7765536.0 | -0.0004 | 3.2831 |
| 0.4854 | 1.1111 | 100 | 0.4841 | 0.1104 | -15.4301 | -7037787.6364 | -0.0068 | -19.8353 | -7746840.8889 | 0.1172 | 5.6073 |
| 0.3877 | 1.6667 | 150 | 0.3919 | 0.0166 | -16.3681 | -6113848.7273 | -1.1030 | -30.7975 | -7278912.4444 | 1.1196 | 1.6923 |
| 0.2713 | 2.2222 | 200 | 0.2989 | 0.2683 | -13.8509 | -5712961.0909 | -2.3006 | -42.7735 | -6877109.3333 | 2.5690 | 1.5282 |
| 0.2298 | 2.7778 | 250 | 0.2562 | 0.2113 | -14.4207 | -5691608.7273 | -4.0499 | -60.2658 | -6756057.7778 | 4.2612 | 1.6214 |
| 0.2023 | 3.3333 | 300 | 0.2438 | 0.0519 | -16.0149 | -5509416.3636 | -5.4444 | -74.2112 | -6538183.1111 | 5.4963 | 2.0586 |
| 0.2091 | 3.8889 | 350 | 0.2401 | -0.0056 | -16.5904 | -5302801.0909 | -6.0992 | -80.7588 | -6321514.2222 | 6.0935 | 1.2333 |
| 0.1803 | 4.4444 | 400 | 0.2313 | -0.0036 | -16.5705 | -5251804.7273 | -6.5164 | -84.9310 | -6353763.5556 | 6.5127 | 0.9655 |
| 0.1882 | 5.0 | 450 | 0.2316 | -0.0895 | -17.4291 | -5285734.9091 | -6.9674 | -89.4410 | -6387964.4444 | 6.8779 | 0.8871 |
| 0.2097 | 5.5556 | 500 | 0.2321 | -0.0880 | -17.4141 | -5317437.0909 | -6.9551 | -89.3176 | -6466442.2222 | 6.8671 | 0.8415 |
| 0.2101 | 6.1111 | 550 | 0.2369 | -0.2595 | -19.1287 | -5358458.1818 | -7.4899 | -94.6661 | -6556113.7778 | 7.2304 | 0.6120 |
| 0.2205 | 6.6667 | 600 | 0.2306 | -0.0927 | -17.4612 | -5311256.0 | -7.1726 | -91.4927 | -6522494.2222 | 7.0798 | 2.9167 |
| 0.2015 | 7.2222 | 650 | 0.2278 | -0.2235 | -18.7694 | -5318941.0909 | -7.7564 | -97.3308 | -6520522.6667 | 7.5328 | 3.1473 |
| 0.1847 | 7.7778 | 700 | 0.2302 | -0.2017 | -18.5512 | -5325900.7273 | -7.6276 | -96.0427 | -6506082.2222 | 7.4258 | 0.0 |
| 0.1755 | 8.3333 | 750 | 0.2296 | -0.2041 | -18.5748 | -5375173.4545 | -7.6845 | -96.6120 | -6566019.5556 | 7.4804 | 0.0 |
| 0.1484 | 8.8889 | 800 | 0.2270 | -0.2105 | -18.6391 | -5378953.8182 | -7.7307 | -97.0744 | -6566795.5556 | 7.5202 | 0.0 |
| 0.2069 | 9.4444 | 850 | 0.2268 | -0.2424 | -18.9583 | -5380960.3636 | -7.7435 | -97.2023 | -6542436.0 | 7.5011 | 0.0 |
| 0.1825 | 10.0 | 900 | 0.2275 | -0.2352 | -18.8865 | -5396778.1818 | -7.7588 | -97.3554 | -6567349.3333 | 7.5236 | 0.0 |
Framework versions
- PEFT 0.12.0
- Transformers 4.46.1
- Pytorch 2.5.1+cu124
- Datasets 3.1.0
- Tokenizers 0.20.3
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for chchen/Llama-3.1-8B-Instruct-KTO-800
Base model
meta-llama/Llama-3.1-8B
Finetuned
meta-llama/Llama-3.1-8B-Instruct