train_svamp_789_1760637888

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the svamp dataset. It achieves the following results on the evaluation set:

Loss: 0.8562
Num Input Tokens Seen: 1430656

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.0672	1.0	158	0.2070	71280
0.0637	2.0	316	0.1112	143040
0.0536	3.0	474	0.0973	214736
0.042	4.0	632	0.0964	286128
0.073	5.0	790	0.0950	357600
0.0289	6.0	948	0.0984	428976
0.0637	7.0	1106	0.1277	500400
0.058	8.0	1264	0.1081	572048
0.0221	9.0	1422	0.1184	643792
0.0081	10.0	1580	0.1304	715344
0.0286	11.0	1738	0.1277	786864
0.0041	12.0	1896	0.1406	858288
0.0004	13.0	2054	0.1334	929616
0.0001	14.0	2212	0.1409	1001040
0.0004	15.0	2370	0.1455	1072880
0.0004	16.0	2528	0.1481	1144416
0.0003	17.0	2686	0.1488	1215856
0.0001	18.0	2844	0.1492	1287504
0.0006	19.0	3002	0.1507	1359168
0.0001	20.0	3160	0.1499	1430656

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 2

Model tree for rbelanec/train_svamp_789_1760637888

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2395)

this model