train_multirc_456_1767502358

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the multirc dataset. It achieves the following results on the evaluation set:

Loss: 0.1383
Num Input Tokens Seen: 264580656

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.2213	1.0	6130	0.1684	13210560
0.1061	2.0	12260	0.1579	26427632
0.0414	3.0	18390	0.1466	39656608
0.0137	4.0	24520	0.1405	52911264
0.2727	5.0	30650	0.1383	66151456
0.1947	6.0	36780	0.1599	79368416
0.0038	7.0	42910	0.1845	92601264
0.0038	8.0	49040	0.1872	105825424
0.0136	9.0	55170	0.2164	119051808
0.0095	10.0	61300	0.2484	132282512
0.0032	11.0	67430	0.2736	145503424
0.0021	12.0	73560	0.3133	158718688
0.0012	13.0	79690	0.3675	171968272
0.0001	14.0	85820	0.3954	185192912
0.0028	15.0	91950	0.4531	198406496
0.0004	16.0	98080	0.4826	211654512
0.0002	17.0	104210	0.4862	224875552
0.065	18.0	110340	0.4976	238097088
0.0016	19.0	116470	0.5004	251336240
0.0001	20.0	122600	0.5012	264580656

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.8.0+cu128
Datasets 3.6.0
Tokenizers 0.21.1

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_multirc_456_1767502358

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2401)

this model