train_conala_42_1760637550

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the conala dataset. It achieves the following results on the evaluation set:

Loss: 0.6222
Num Input Tokens Seen: 3049984

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.7654	1.0	536	0.6617	153352
0.5783	2.0	1072	0.6333	305496
0.3056	3.0	1608	0.6222	458160
0.3788	4.0	2144	0.6655	610584
0.2578	5.0	2680	0.7357	763216
0.1578	6.0	3216	0.8164	915528
0.1493	7.0	3752	0.9532	1067904
0.086	8.0	4288	1.0023	1221016
0.0326	9.0	4824	1.1793	1373032
0.3721	10.0	5360	1.1817	1525104
0.0623	11.0	5896	1.3161	1677680
0.0618	12.0	6432	1.3406	1830200
0.0411	13.0	6968	1.4055	1982664
0.0253	14.0	7504	1.4741	2135168
0.0134	15.0	8040	1.5132	2287232
0.0068	16.0	8576	1.5603	2438992
0.0071	17.0	9112	1.5544	2591432
0.0423	18.0	9648	1.6100	2744944
0.0006	19.0	10184	1.6332	2897552
0.0005	20.0	10720	1.6344	3049984

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: -

Model tree for rbelanec/train_conala_42_1760637550

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2187)

this model