train_siqa_456_1760637831

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the siqa dataset. It achieves the following results on the evaluation set:

Loss: 0.2033
Num Input Tokens Seen: 60272064

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.2224	1.0	7518	0.2244	3015336
0.0608	2.0	15036	0.2033	6029736
0.1228	3.0	22554	0.2198	9044064
0.0138	4.0	30072	0.3153	12056056
0.2185	5.0	37590	0.4288	15070152
0.2279	6.0	45108	0.5203	18083976
0.0005	7.0	52626	0.5584	21097056
0.0001	8.0	60144	0.5579	24109664
0.0003	9.0	67662	0.6020	27122784
0.0	10.0	75180	0.6767	30139392
0.0004	11.0	82698	0.6502	33151800
0.0509	12.0	90216	0.5445	36165976
0.0	13.0	97734	0.6952	39180248
0.0	14.0	105252	0.8324	42193928
0.0	15.0	112770	0.9043	45207272
0.0	16.0	120288	0.8563	48219232
0.0	17.0	127806	1.0298	51231624
0.0	18.0	135324	1.0885	54245832
0.0	19.0	142842	1.1213	57258952
0.0	20.0	150360	1.1215	60272064

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_siqa_456_1760637831

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2392)

this model