train_svamp_456_1760637775

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the svamp dataset. It achieves the following results on the evaluation set:

Loss: 2.3574
Num Input Tokens Seen: 1432752

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
2.3114	1.0	158	2.3960	71728
2.3629	2.0	316	2.3916	143392
2.444	3.0	474	2.3753	214928
2.2586	4.0	632	2.3644	286624
2.4396	5.0	790	2.3658	358096
2.2935	6.0	948	2.3612	429680
2.2293	7.0	1106	2.3599	501168
2.3159	8.0	1264	2.3636	573152
2.3399	9.0	1422	2.3619	644672
2.3443	10.0	1580	2.3611	716272
2.3384	11.0	1738	2.3580	787952
2.4792	12.0	1896	2.3615	859584
2.3733	13.0	2054	2.3607	931280
2.3321	14.0	2212	2.3581	1002960
2.2736	15.0	2370	2.3597	1074528
2.4557	16.0	2528	2.3618	1146144
2.2923	17.0	2686	2.3616	1217760
2.3658	18.0	2844	2.3588	1289504
2.3829	19.0	3002	2.3574	1361040
2.3663	20.0	3160	2.3574	1432752

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: -

Model tree for rbelanec/train_svamp_456_1760637775

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2187)

this model