train_multirc_789_1770290311

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the multirc dataset. It achieves the following results on the evaluation set:

Loss: 0.1570
Num Input Tokens Seen: 264395536

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3754	1.0	6130	0.1825	13229504
0.4394	2.0	12260	0.1570	26459312
0.0838	3.0	18390	0.1607	39686560
0.0171	4.0	24520	0.1571	52924864
0.0078	5.0	30650	0.1576	66146528
0.1329	6.0	36780	0.1785	79364192
0.0027	7.0	42910	0.2149	92568704
0.0632	8.0	49040	0.2025	105788704
0.0017	9.0	55170	0.2569	119004560
0.1296	10.0	61300	0.2513	132223184
0.0026	11.0	67430	0.2681	145445936
0.0005	12.0	73560	0.3659	158686320
0.0006	13.0	79690	0.3957	171910720
0.0007	14.0	85820	0.3975	185139040
0.0001	15.0	91950	0.4505	198344384
0.0092	16.0	98080	0.4601	211581728
0.0004	17.0	104210	0.4789	224787008
0.0003	18.0	110340	0.4931	238003072
0.0013	19.0	116470	0.4958	251207552
0.1408	20.0	122600	0.4963	264395536

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.8.0+cu128
Datasets 3.6.0
Tokenizers 0.21.1

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_multirc_789_1770290311

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2389)

this model