train_conala_123_1760637663

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the conala dataset. It achieves the following results on the evaluation set:

Loss: 0.5831
Num Input Tokens Seen: 3047552

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.8392	1.0	536	0.6565	152672
0.6713	2.0	1072	0.5939	305288
0.4895	3.0	1608	0.5831	457952
0.3873	4.0	2144	0.5936	610944
0.8018	5.0	2680	0.6043	762440
0.4785	6.0	3216	0.6082	914920
0.3683	7.0	3752	0.6279	1067520
0.4296	8.0	4288	0.6276	1220200
0.5198	9.0	4824	0.6657	1372560
0.2031	10.0	5360	0.6830	1524216
0.2693	11.0	5896	0.7332	1675880
0.2437	12.0	6432	0.7569	1828344
0.2029	13.0	6968	0.8369	1980376
0.0736	14.0	7504	0.9274	2132544
0.0333	15.0	8040	1.0068	2284440
0.0444	16.0	8576	1.0613	2436520
0.0281	17.0	9112	1.0871	2589096
0.0283	18.0	9648	1.0918	2741936
0.0485	19.0	10184	1.0929	2894976
0.0444	20.0	10720	1.0931	3047552

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_conala_123_1760637663

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2392)

this model