train_multirc_789_1770132513

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the multirc dataset. It achieves the following results on the evaluation set:

Loss: 0.3186
Num Input Tokens Seen: 264395536

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3101	1.0	6130	0.3239	13229504
0.258	2.0	12260	0.3212	26459312
0.2858	3.0	18390	0.3202	39686560
0.3227	4.0	24520	0.3193	52924864
0.3292	5.0	30650	0.3198	66146528
0.3253	6.0	36780	0.3195	79364192
0.4473	7.0	42910	0.3259	92568704
0.324	8.0	49040	0.3208	105788704
0.2602	9.0	55170	0.3190	119004560
0.2671	10.0	61300	0.3194	132223184
0.3743	11.0	67430	0.3214	145445936
0.3852	12.0	73560	0.3194	158686320
0.3533	13.0	79690	0.3192	171910720
0.2744	14.0	85820	0.3186	185139040
0.2871	15.0	91950	0.3200	198344384
0.2853	16.0	98080	0.3240	211581728
0.4111	17.0	104210	0.3244	224787008
0.3044	18.0	110340	0.3247	238003072
0.284	19.0	116470	0.3254	251207552
0.2732	20.0	122600	0.3258	264395536

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.8.0+cu128
Datasets 3.6.0
Tokenizers 0.21.1

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_multirc_789_1770132513

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2389)

this model