train_siqa_789_1760637944

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the siqa dataset. It achieves the following results on the evaluation set:

Loss: 1.9175
Num Input Tokens Seen: 60282336

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.547	1.0	7518	0.5505	3013096
0.5477	2.0	15036	0.5511	6027776
0.5391	3.0	22554	0.5240	9041456
0.4232	4.0	30072	0.4350	12057104
0.2357	5.0	37590	0.1994	15069560
0.1689	6.0	45108	0.1833	18083408
0.1693	7.0	52626	0.1811	21096352
0.1052	8.0	60144	0.1788	24110848
0.1164	9.0	67662	0.1800	27125432
0.067	10.0	75180	0.1799	30138560
0.1356	11.0	82698	0.1847	33152696
0.1388	12.0	90216	0.1894	36168144
0.0299	13.0	97734	0.2051	39182840
0.0888	14.0	105252	0.2116	42195440
0.0411	15.0	112770	0.2209	45209672
0.089	16.0	120288	0.2332	48221640
0.1245	17.0	127806	0.2547	51235560
0.1325	18.0	135324	0.2637	54251480
0.029	19.0	142842	0.2732	57267064
0.0026	20.0	150360	0.2766	60282336

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: -

Model tree for rbelanec/train_siqa_789_1760637944

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2393)

this model