train_multirc_42_1762438236

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the multirc dataset. It achieves the following results on the evaluation set:

Loss: 0.1316
Num Input Tokens Seen: 264840880

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1685	1.0	6130	0.1692	13256608
0.0664	2.0	12260	0.1500	26510112
0.0817	3.0	18390	0.1415	39755376
0.2479	4.0	24520	0.1383	53010912
0.1819	5.0	30650	0.1354	66248576
0.1082	6.0	36780	0.1367	79495984
0.1441	7.0	42910	0.1316	92713360
0.0245	8.0	49040	0.1318	105934480
0.2177	9.0	55170	0.1329	119164864
0.0982	10.0	61300	0.1324	132392640
0.3213	11.0	67430	0.1341	145641920
0.1288	12.0	73560	0.1346	158902432
0.1889	13.0	79690	0.1392	172144032
0.3426	14.0	85820	0.1362	185378480
0.237	15.0	91950	0.1357	198621168
0.1404	16.0	98080	0.1367	211855376
0.1051	17.0	104210	0.1377	225105296
0.2527	18.0	110340	0.1379	238352272
0.0182	19.0	116470	0.1388	251594480
0.0545	20.0	122600	0.1379	264840880

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.8.0+cu128
Datasets 3.6.0
Tokenizers 0.21.1

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_multirc_42_1762438236

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2400)

this model