train_siqa_789_1760637947

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the siqa dataset. It achieves the following results on the evaluation set:

Loss: 0.1901
Num Input Tokens Seen: 60282336

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3371	1.0	7518	0.2678	3013096
0.1982	2.0	15036	0.2279	6027776
0.1646	3.0	22554	0.2104	9041456
0.1794	4.0	30072	0.2022	12057104
0.4498	5.0	37590	0.1975	15069560
0.2096	6.0	45108	0.1953	18083408
0.1605	7.0	52626	0.1930	21096352
0.094	8.0	60144	0.1910	24110848
0.0974	9.0	67662	0.1932	27125432
0.0905	10.0	75180	0.1912	30138560
0.0652	11.0	82698	0.1913	33152696
0.105	12.0	90216	0.1901	36168144
0.1545	13.0	97734	0.1911	39182840
0.1125	14.0	105252	0.1933	42195440
0.1234	15.0	112770	0.1920	45209672
0.1512	16.0	120288	0.1931	48221640
0.1335	17.0	127806	0.1931	51235560
0.2893	18.0	135324	0.1934	54251480
0.1522	19.0	142842	0.1936	57267064
0.0642	20.0	150360	0.1934	60282336

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_siqa_789_1760637947

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2393)

this model