train_conala_123_1760637667

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the conala dataset. It achieves the following results on the evaluation set:

Loss: 2.7170
Num Input Tokens Seen: 3047552

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
3.2803	1.0	536	2.7423	152672
2.8697	2.0	1072	2.7333	305288
2.9412	3.0	1608	2.7242	457952
2.5292	4.0	2144	2.7203	610944
3.0791	5.0	2680	2.7200	762440
2.6249	6.0	3216	2.7178	914920
3.2274	7.0	3752	2.7170	1067520
3.1557	8.0	4288	2.7171	1220200
3.0671	9.0	4824	2.7177	1372560
3.5507	10.0	5360	2.7179	1524216
3.2952	11.0	5896	2.7186	1675880
2.775	12.0	6432	2.7180	1828344
3.2292	13.0	6968	2.7170	1980376
2.4894	14.0	7504	2.7172	2132544
2.4565	15.0	8040	2.7187	2284440
2.7554	16.0	8576	2.7193	2436520
2.4938	17.0	9112	2.7197	2589096
3.0238	18.0	9648	2.7175	2741936
2.9629	19.0	10184	2.7172	2894976
3.0866	20.0	10720	2.7171	3047552

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 2

Model tree for rbelanec/train_conala_123_1760637667

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2392)

this model