Llama-3.1-8B-Instruct-KTO-500

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the bct_non_cot_kto_500 dataset. It achieves the following results on the evaluation set:

Loss: 0.2733
Rewards/chosen: -0.7733
Logps/chosen: -22.9961
Logits/chosen: -5270992.0
Rewards/rejected: -5.5785
Logps/rejected: -75.2760
Logits/rejected: -6459799.04
Rewards/margins: 4.8052
Kl: 0.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Logps/chosen	Logits/chosen	Rewards/rejected	Logps/rejected	Logits/rejected	Rewards/margins
0.499	0.8889	50	0.4996	0.0154	-15.1082	-6797012.48	0.0162	-19.3286	-7105245.44	-0.0008	3.8204
0.4715	1.7778	100	0.4705	0.2738	-12.5245	-6498615.68	0.0353	-19.1378	-7022910.72	0.2385	12.4696
0.3459	2.6667	150	0.3755	0.0903	-14.3596	-5608189.44	-1.1001	-30.4920	-6636171.52	1.1905	0.7521
0.2879	3.5556	200	0.3254	0.0251	-15.0117	-5304341.76	-2.3085	-42.5760	-6579046.4	2.3336	0.0
0.2319	4.4444	250	0.3015	-0.1526	-16.7888	-5472120.32	-3.5465	-54.9555	-6862511.36	3.3939	3.7907
0.1971	5.3333	300	0.2927	-0.4757	-20.0192	-5352769.92	-4.5843	-65.3341	-6700284.16	4.1087	0.0
0.1825	6.2222	350	0.2855	-0.6676	-21.9389	-5317811.2	-5.1406	-70.8970	-6587809.28	4.4730	0.0
0.1996	7.1111	400	0.2804	-0.6851	-22.1139	-5272716.48	-5.2919	-72.4094	-6501901.44	4.6068	0.0
0.1776	8.0	450	0.2753	-0.7492	-22.7551	-5282150.4	-5.4837	-74.3277	-6475059.2	4.7345	0.0
0.208	8.8889	500	0.2733	-0.7733	-22.9961	-5270992.0	-5.5785	-75.2760	-6459799.04	4.8052	0.0
0.2075	9.7778	550	0.2766	-0.7819	-23.0821	-5283820.8	-5.5377	-74.8675	-6485849.6	4.7557	0.0

Framework versions

PEFT 0.12.0
Transformers 4.46.1
Pytorch 2.5.1+cu124
Datasets 3.1.0
Tokenizers 0.20.3

Downloads last month: 2

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for chchen/Llama-3.1-8B-Instruct-KTO-500

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Adapter

(1956)

this model