train_multirc_42_1762193638

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the multirc dataset. It achieves the following results on the evaluation set:

Loss: 0.2830
Num Input Tokens Seen: 264840880

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3298	1.0	6130	0.3221	13256608
0.2759	2.0	12260	0.3199	26510112
0.2767	3.0	18390	0.3203	39755376
0.3458	4.0	24520	0.3165	53010912
0.3172	5.0	30650	0.3173	66248576
0.3261	6.0	36780	0.3177	79495984
0.3408	7.0	42910	0.3164	92713360
0.3469	8.0	49040	0.3167	105934480
0.3889	9.0	55170	0.3194	119164864
0.2609	10.0	61300	0.3174	132392640
0.4061	11.0	67430	0.3280	145641920
0.2662	12.0	73560	0.3165	158902432
0.3582	13.0	79690	0.3145	172144032
0.3373	14.0	85820	0.3180	185378480
0.3475	15.0	91950	0.2939	198621168
0.2775	16.0	98080	0.2842	211855376
0.2925	17.0	104210	0.2841	225105296
0.3079	18.0	110340	0.2830	238352272
0.354	19.0	116470	0.2837	251594480
0.3419	20.0	122600	0.2838	264840880

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.8.0+cu128
Datasets 3.6.0
Tokenizers 0.21.1

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_multirc_42_1762193638

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2401)

this model