train_conala_123_1760637666

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the conala dataset. It achieves the following results on the evaluation set:

Loss: 0.5742
Num Input Tokens Seen: 3047552

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.8413	1.0	536	0.6011	152672
0.6224	2.0	1072	0.5773	305288
0.4317	3.0	1608	0.5742	457952
0.2788	4.0	2144	0.5962	610944
0.4801	5.0	2680	0.6492	762440
0.2656	6.0	3216	0.7369	914920
0.1983	7.0	3752	0.8697	1067520
0.1768	8.0	4288	0.9656	1220200
0.11	9.0	4824	1.0390	1372560
0.0316	10.0	5360	1.0801	1524216
0.0699	11.0	5896	1.2046	1675880
0.0556	12.0	6432	1.1798	1828344
0.0768	13.0	6968	1.2252	1980376
0.0248	14.0	7504	1.2389	2132544
0.0014	15.0	8040	1.3620	2284440
0.018	16.0	8576	1.4280	2436520
0.0004	17.0	9112	1.4676	2589096
0.0019	18.0	9648	1.5184	2741936
0.0222	19.0	10184	1.5373	2894976
0.0093	20.0	10720	1.5436	3047552

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_conala_123_1760637666

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2392)

this model