train_conala_789_1760637895

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the conala dataset. It achieves the following results on the evaluation set:

Loss: 0.6264
Num Input Tokens Seen: 3037136

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.5855	1.0	536	0.6764	152296
0.8375	2.0	1072	0.6356	304440
0.6417	3.0	1608	0.6264	455928
0.4542	4.0	2144	0.6428	608072
0.3091	5.0	2680	0.6868	759296
0.2074	6.0	3216	0.7563	910984
0.1068	7.0	3752	0.8662	1062816
0.089	8.0	4288	0.9246	1214520
0.0498	9.0	4824	1.1260	1366480
0.0517	10.0	5360	1.1372	1518976
0.0426	11.0	5896	1.1920	1670320
0.0296	12.0	6432	1.2391	1822624
0.0017	13.0	6968	1.2560	1974336
0.0017	14.0	7504	1.3302	2126488
0.001	15.0	8040	1.4223	2278280
0.0219	16.0	8576	1.4497	2430272
0.0007	17.0	9112	1.4627	2581848
0.0058	18.0	9648	1.5097	2733712
0.0248	19.0	10184	1.5317	2885208
0.0009	20.0	10720	1.5310	3037136

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_conala_789_1760637895

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2389)

this model