train_siqa_789_1760637946

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the siqa dataset. It achieves the following results on the evaluation set:

Loss: 0.6400
Num Input Tokens Seen: 60282336

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.7441	1.0	7518	0.6467	3013096
0.2941	2.0	15036	0.6436	6027776
0.3547	3.0	22554	0.6418	9041456
0.6727	4.0	30072	0.6401	12057104
0.807	5.0	37590	0.6416	15069560
0.4643	6.0	45108	0.6400	18083408
1.1001	7.0	52626	0.6429	21096352
0.1748	8.0	60144	0.6419	24110848
0.2459	9.0	67662	0.6430	27125432
1.2622	10.0	75180	0.6418	30138560
0.1197	11.0	82698	0.6417	33152696
0.3495	12.0	90216	0.6430	36168144
0.7213	13.0	97734	0.6427	39182840
0.3153	14.0	105252	0.6410	42195440
0.9943	15.0	112770	0.6403	45209672
0.4712	16.0	120288	0.6431	48221640
0.4297	17.0	127806	0.6416	51235560
0.6579	18.0	135324	0.6401	54251480
0.8982	19.0	142842	0.6437	57267064
0.2255	20.0	150360	0.6437	60282336

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_siqa_789_1760637946

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2393)

this model