train_siqa_789_1760637942

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the siqa dataset. It achieves the following results on the evaluation set:

Loss: 0.9188
Num Input Tokens Seen: 53569152

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.5488	2.0	13364	0.5495	5356704
0.3478	4.0	26728	0.4579	10714656
0.1526	6.0	40092	0.2521	16072768
0.1181	8.0	53456	0.2773	21430432
0.1266	10.0	66820	0.3287	26786144
0.0015	12.0	80184	0.4566	32142016
0.0002	14.0	93548	0.5701	37497856
0.0002	16.0	106912	0.7638	42853376
0.0	18.0	120276	0.8669	48211808
0.0	20.0	133640	0.9188	53569152

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: -

Model tree for rbelanec/train_siqa_789_1760637942

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2186)

this model