Llama-3.1-8B-Instruct-KTO-900

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on the bct_non_cot_kto_900 dataset. It achieves the following results on the evaluation set:

Loss: 0.2336
Rewards/chosen: -0.7387
Logps/chosen: -23.9743
Logits/chosen: -3936346.7253
Rewards/rejected: -8.0820
Logps/rejected: -101.9145
Logits/rejected: -5595913.7079
Rewards/margins: 7.3434
Kl: 0.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Logps/chosen	Logits/chosen	Rewards/rejected	Logps/rejected	Logits/rejected	Rewards/margins
0.4998	0.4938	50	0.4998	0.0071	-16.5171	-6513294.0659	0.0060	-21.0342	-8309430.6517	0.0011	7.3805
0.4916	0.9877	100	0.4882	0.0450	-16.1375	-6392945.9341	-0.0510	-21.6039	-8292936.6292	0.0960	1.8784
0.388	1.4815	150	0.3815	0.1292	-15.2958	-5358742.5055	-0.9531	-30.6250	-7796024.0899	1.0823	3.1272
0.3027	1.9753	200	0.2895	0.2877	-13.7103	-4776840.7912	-2.2363	-43.4567	-7301079.7303	2.5240	0.0
0.2489	2.4691	250	0.2521	0.2507	-14.0810	-4633505.4066	-3.7515	-58.6088	-6882766.3820	4.0022	0.0
0.2086	2.9630	300	0.2438	0.0154	-16.4339	-4421153.7582	-5.2262	-73.3562	-6361427.4157	5.2416	0.0
0.1899	3.4568	350	0.2428	-0.2031	-18.6187	-4157514.5495	-5.8225	-79.3186	-5951174.4719	5.6194	0.0
0.25	3.9506	400	0.2462	-0.2869	-19.4569	-4007848.4396	-6.0265	-81.3591	-5686904.8090	5.7396	0.0
0.2039	4.4444	450	0.2406	-0.3700	-20.2878	-3949201.2308	-6.5107	-86.2007	-5679643.3258	6.1407	0.0
0.1757	4.9383	500	0.2408	-0.5367	-21.9549	-3883009.7582	-7.2303	-93.3975	-5592792.4494	6.6936	0.0
0.1745	5.4321	550	0.2381	-0.4705	-21.2929	-3896917.0989	-6.9828	-90.9218	-5625497.8876	6.5123	0.0
0.1966	5.9259	600	0.2399	-0.6421	-23.0091	-3685890.8132	-7.6576	-97.6701	-5304065.7978	7.0155	0.0
0.1717	6.4198	650	0.2365	-0.6360	-22.9479	-3976765.8901	-7.6226	-97.3204	-5631893.5730	6.9866	0.0
0.1746	6.9136	700	0.2336	-0.7387	-23.9743	-3936346.7253	-8.0820	-101.9145	-5595913.7079	7.3434	0.0
0.1586	7.4074	750	0.2342	-0.7033	-23.6211	-3965146.7253	-7.9157	-100.2512	-5631508.8539	7.2124	0.0
0.1651	7.9012	800	0.2346	-0.7270	-23.8581	-3923032.9670	-8.0741	-101.8346	-5564442.2472	7.3470	0.0
0.1761	8.3951	850	0.2382	-0.8088	-24.6761	-3866978.8132	-8.2918	-104.0118	-5499030.6517	7.4829	0.0
0.1829	8.8889	900	0.2382	-0.8048	-24.6361	-3876442.3736	-8.3157	-104.2511	-5500599.3708	7.5109	0.0
0.1621	9.3827	950	0.2386	-0.8111	-24.6987	-3862863.1209	-8.3110	-104.2041	-5483545.5281	7.4999	0.0
0.1824	9.8765	1000	0.2388	-0.8165	-24.7532	-3883063.5604	-8.3197	-104.2909	-5463227.6854	7.5031	0.0

Framework versions

PEFT 0.12.0
Transformers 4.46.1
Pytorch 2.5.1+cu124
Datasets 3.1.0
Tokenizers 0.20.3

Downloads last month: 3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for chchen/Llama-3.1-8B-Instruct-KTO-900

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Adapter

(1956)

this model