Llama-3.1-8B-Instruct-KTO-800

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the bct_non_cot_kto_800 dataset. It achieves the following results on the evaluation set:

Loss: 0.2268
Rewards/chosen: -0.2424
Logps/chosen: -18.9583
Logits/chosen: -5380960.3636
Rewards/rejected: -7.7435
Logps/rejected: -97.2023
Logits/rejected: -6542436.0
Rewards/margins: 7.5011
Kl: 0.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Logps/chosen	Logits/chosen	Rewards/rejected	Logps/rejected	Logits/rejected	Rewards/margins
0.4996	0.5556	50	0.5001	0.0018	-16.5156	-7144997.8182	0.0023	-19.7443	-7765536.0	-0.0004	3.2831
0.4854	1.1111	100	0.4841	0.1104	-15.4301	-7037787.6364	-0.0068	-19.8353	-7746840.8889	0.1172	5.6073
0.3877	1.6667	150	0.3919	0.0166	-16.3681	-6113848.7273	-1.1030	-30.7975	-7278912.4444	1.1196	1.6923
0.2713	2.2222	200	0.2989	0.2683	-13.8509	-5712961.0909	-2.3006	-42.7735	-6877109.3333	2.5690	1.5282
0.2298	2.7778	250	0.2562	0.2113	-14.4207	-5691608.7273	-4.0499	-60.2658	-6756057.7778	4.2612	1.6214
0.2023	3.3333	300	0.2438	0.0519	-16.0149	-5509416.3636	-5.4444	-74.2112	-6538183.1111	5.4963	2.0586
0.2091	3.8889	350	0.2401	-0.0056	-16.5904	-5302801.0909	-6.0992	-80.7588	-6321514.2222	6.0935	1.2333
0.1803	4.4444	400	0.2313	-0.0036	-16.5705	-5251804.7273	-6.5164	-84.9310	-6353763.5556	6.5127	0.9655
0.1882	5.0	450	0.2316	-0.0895	-17.4291	-5285734.9091	-6.9674	-89.4410	-6387964.4444	6.8779	0.8871
0.2097	5.5556	500	0.2321	-0.0880	-17.4141	-5317437.0909	-6.9551	-89.3176	-6466442.2222	6.8671	0.8415
0.2101	6.1111	550	0.2369	-0.2595	-19.1287	-5358458.1818	-7.4899	-94.6661	-6556113.7778	7.2304	0.6120
0.2205	6.6667	600	0.2306	-0.0927	-17.4612	-5311256.0	-7.1726	-91.4927	-6522494.2222	7.0798	2.9167
0.2015	7.2222	650	0.2278	-0.2235	-18.7694	-5318941.0909	-7.7564	-97.3308	-6520522.6667	7.5328	3.1473
0.1847	7.7778	700	0.2302	-0.2017	-18.5512	-5325900.7273	-7.6276	-96.0427	-6506082.2222	7.4258	0.0
0.1755	8.3333	750	0.2296	-0.2041	-18.5748	-5375173.4545	-7.6845	-96.6120	-6566019.5556	7.4804	0.0
0.1484	8.8889	800	0.2270	-0.2105	-18.6391	-5378953.8182	-7.7307	-97.0744	-6566795.5556	7.5202	0.0
0.2069	9.4444	850	0.2268	-0.2424	-18.9583	-5380960.3636	-7.7435	-97.2023	-6542436.0	7.5011	0.0
0.1825	10.0	900	0.2275	-0.2352	-18.8865	-5396778.1818	-7.7588	-97.3554	-6567349.3333	7.5236	0.0

Framework versions

PEFT 0.12.0
Transformers 4.46.1
Pytorch 2.5.1+cu124
Datasets 3.1.0
Tokenizers 0.20.3

Downloads last month: 4

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for chchen/Llama-3.1-8B-Instruct-KTO-800

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Adapter

(1956)

this model