train_siqa_789_1760637945

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the siqa dataset. It achieves the following results on the evaluation set:

Loss: 0.2025
Num Input Tokens Seen: 60282336

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1291	1.0	7518	0.2025	3013096
0.2265	2.0	15036	0.2143	6027776
0.0261	3.0	22554	0.2202	9041456
0.0629	4.0	30072	0.3165	12057104
0.001	5.0	37590	0.4297	15069560
0.0013	6.0	45108	0.5747	18083408
0.3939	7.0	52626	0.6421	21096352
0.0079	8.0	60144	0.5257	24110848
0.0011	9.0	67662	0.5493	27125432
0.0003	10.0	75180	0.5625	30138560
0.0001	11.0	82698	0.6793	33152696
0.0	12.0	90216	0.7465	36168144
0.0	13.0	97734	0.7437	39182840
0.0	14.0	105252	0.8272	42195440
0.0	15.0	112770	0.8775	45209672
0.0	16.0	120288	0.9335	48221640
0.0	17.0	127806	0.9849	51235560
0.0	18.0	135324	1.0282	54251480
0.0	19.0	142842	1.0427	57267064
0.0	20.0	150360	1.0453	60282336

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_siqa_789_1760637945

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2393)

this model