train_siqa_42_1760637599

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the siqa dataset. It achieves the following results on the evaluation set:

Loss: 0.5494
Num Input Tokens Seen: 60302568

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.5561	1.0	7518	0.5512	3016248
0.5429	2.0	15036	0.5535	6032368
0.5516	3.0	22554	0.5507	9049000
0.5449	4.0	30072	0.5505	12063104
0.5519	5.0	37590	0.5495	15078392
0.546	6.0	45108	0.5499	18094200
0.5553	7.0	52626	0.5507	21109936
0.5412	8.0	60144	0.5499	24124456
0.5574	9.0	67662	0.5494	27139488
0.5554	10.0	75180	0.5505	30155824
0.5497	11.0	82698	0.5500	33169800
0.5482	12.0	90216	0.5499	36184296
0.5502	13.0	97734	0.5496	39199224
0.5474	14.0	105252	0.5495	42213984
0.5467	15.0	112770	0.5498	45227616
0.5542	16.0	120288	0.5496	48242336
0.5515	17.0	127806	0.5500	51258152
0.5505	18.0	135324	0.5497	54272896
0.553	19.0	142842	0.5496	57288368
0.538	20.0	150360	0.5502	60302568

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_siqa_42_1760637599

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2393)

this model