train_conala_456_1760637778

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the conala dataset. It achieves the following results on the evaluation set:

Loss: 1.2150
Num Input Tokens Seen: 2706152

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 4
eval_batch_size: 4
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.8022	2.0	952	0.8614	270552
0.7902	4.0	1904	0.7434	540824
0.5533	6.0	2856	0.7248	812136
0.4689	8.0	3808	0.7860	1082976
0.3529	10.0	4760	0.8404	1354056
0.265	12.0	5712	0.9470	1624392
0.2268	14.0	6664	1.0683	1894560
0.1057	16.0	7616	1.1523	2165056
0.2383	18.0	8568	1.2047	2435976
0.2193	20.0	9520	1.2150	2706152

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_conala_456_1760637778

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2392)

this model