train_svamp_789_1760637889

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the svamp dataset. It achieves the following results on the evaluation set:

Loss: 0.0877
Num Input Tokens Seen: 1430656

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.0415	1.0	158	0.1214	71280
0.0311	2.0	316	0.1051	143040
0.0276	3.0	474	0.0877	214736
0.0058	4.0	632	0.0993	286128
0.0346	5.0	790	0.0914	357600
0.0021	6.0	948	0.0972	428976
0.0236	7.0	1106	0.1360	500400
0.0009	8.0	1264	0.1425	572048
0.0	9.0	1422	0.1590	643792
0.0	10.0	1580	0.1614	715344
0.0	11.0	1738	0.1635	786864
0.0	12.0	1896	0.1661	858288
0.0	13.0	2054	0.1681	929616
0.0	14.0	2212	0.1691	1001040
0.0	15.0	2370	0.1698	1072880
0.0	16.0	2528	0.1686	1144416
0.0	17.0	2686	0.1712	1215856
0.0	18.0	2844	0.1716	1287504
0.0	19.0	3002	0.1711	1359168
0.0	20.0	3160	0.1708	1430656

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 2

Model tree for rbelanec/train_svamp_789_1760637889

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2394)

this model