train_conala_42_1760637548

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the conala dataset. It achieves the following results on the evaluation set:

Loss: 0.6289
Num Input Tokens Seen: 3049984

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.8156	1.0	536	0.6592	153352
0.6364	2.0	1072	0.6650	305496
0.3716	3.0	1608	0.6289	458160
0.4885	4.0	2144	0.6445	610584
0.4551	5.0	2680	0.6310	763216
0.324	6.0	3216	0.6467	915528
0.4477	7.0	3752	0.6731	1067904
0.303	8.0	4288	0.7115	1221016
0.2757	9.0	4824	0.7377	1373032
0.7593	10.0	5360	0.7847	1525104
0.2469	11.0	5896	0.8181	1677680
0.1783	12.0	6432	0.9106	1830200
0.1514	13.0	6968	1.0473	1982664
0.0633	14.0	7504	1.0969	2135168
0.0299	15.0	8040	1.1955	2287232
0.0265	16.0	8576	1.2349	2438992
0.02	17.0	9112	1.2547	2591432
0.0704	18.0	9648	1.2614	2744944
0.0371	19.0	10184	1.2645	2897552
0.0153	20.0	10720	1.2632	3049984

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_conala_42_1760637548

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2394)

this model