train_svamp_42_1760637543

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the svamp dataset. It achieves the following results on the evaluation set:

Loss: 0.0827
Num Input Tokens Seen: 1433520

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.0814	1.0	158	0.4345	71568
0.056	2.0	316	0.0867	143232
0.0493	3.0	474	0.0727	214912
0.107	4.0	632	0.0583	286448
0.0361	5.0	790	0.0556	358176
0.0469	6.0	948	0.0544	429728
0.035	7.0	1106	0.0422	501504
0.0461	8.0	1264	0.0435	573120
0.0055	9.0	1422	0.0514	644944
0.0283	10.0	1580	0.0590	716448
0.0121	11.0	1738	0.0728	788256
0.0041	12.0	1896	0.0705	859808
0.0121	13.0	2054	0.0737	931472
0.0007	14.0	2212	0.0768	1003376
0.0012	15.0	2370	0.0877	1075088
0.0006	16.0	2528	0.0871	1146608
0.0004	17.0	2686	0.0893	1218368
0.0002	18.0	2844	0.0901	1290144
0.0011	19.0	3002	0.0906	1361984
0.0002	20.0	3160	0.0907	1433520

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_svamp_42_1760637543

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2398)

this model