train_siqa_123_1760637718

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the siqa dataset. It achieves the following results on the evaluation set:

Loss: 0.6259
Num Input Tokens Seen: 60276872

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1771	1.0	7518	0.6354	3014896
0.5044	2.0	15036	0.6309	6029360
0.5528	3.0	22554	0.6298	9042368
1.1872	4.0	30072	0.6283	12055456
0.5423	5.0	37590	0.6301	15068512
0.425	6.0	45108	0.6280	18081672
0.8483	7.0	52626	0.6273	21095960
1.5292	8.0	60144	0.6305	24109392
0.813	9.0	67662	0.6276	27122856
1.7637	10.0	75180	0.6298	30137256
0.399	11.0	82698	0.6302	33151024
0.2931	12.0	90216	0.6291	36165000
0.4369	13.0	97734	0.6285	39178496
0.8953	14.0	105252	0.6303	42193184
0.7673	15.0	112770	0.6303	45206576
0.2509	16.0	120288	0.6296	48220600
0.2142	17.0	127806	0.6308	51235344
0.5036	18.0	135324	0.6259	54249648
0.2148	19.0	142842	0.6288	57262600
1.737	20.0	150360	0.6288	60276872

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_siqa_123_1760637718

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2393)

this model