train_conala_456_1760637781

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the conala dataset. It achieves the following results on the evaluation set:

Loss: 0.6149
Num Input Tokens Seen: 3043720

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.7262	1.0	536	0.6557	152088
0.5159	2.0	1072	0.6244	303784
0.4052	3.0	1608	0.6149	456552
0.4489	4.0	2144	0.6298	608704
0.265	5.0	2680	0.6913	761184
0.267	6.0	3216	0.8024	912912
0.2017	7.0	3752	0.8242	1065128
0.1384	8.0	4288	0.9914	1216496
0.1246	9.0	4824	1.1121	1368880
0.0158	10.0	5360	1.2298	1522016
0.0442	11.0	5896	1.2697	1674136
0.0639	12.0	6432	1.3826	1826160
0.0507	13.0	6968	1.3614	1978984
0.002	14.0	7504	1.4677	2130656
0.0498	15.0	8040	1.4448	2282720
0.0431	16.0	8576	1.5106	2434896
0.0368	17.0	9112	1.5431	2586968
0.0186	18.0	9648	1.5920	2738448
0.0082	19.0	10184	1.6197	2891056
0.0034	20.0	10720	1.6192	3043720

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_conala_456_1760637781

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2393)

this model