train_qqp_1755694487

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the qqp dataset. It achieves the following results on the evaluation set:

Loss: 0.2502
Num Input Tokens Seen: 227659432

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1102	0.5000	81866	0.1892	11386496
0.3185	1.0000	163732	0.2469	22764472
0.1657	1.5000	245598	0.2319	34144408
0.2638	2.0000	327464	0.2277	45529424
0.368	2.5000	409330	0.2274	56915424
0.2096	3.0000	491196	0.2207	68299488
0.2271	3.5000	573062	0.2185	79670992
0.1485	4.0000	654928	0.2099	91066456
0.2115	4.5000	736794	0.2321	102449336
0.1581	5.0000	818660	0.2007	113829176
0.4332	5.5000	900526	0.2162	125219848
0.1755	6.0000	982392	0.2028	136600616
0.1544	6.5000	1064258	0.2058	147981640
0.2321	7.0000	1146124	0.2088	159365688
0.0797	7.5000	1227990	0.2152	170758584
0.1285	8.0000	1309856	0.2110	182133096
0.1539	8.5001	1391722	0.2257	193504584
0.473	9.0001	1473588	0.2346	204895744
0.5513	9.5001	1555454	0.2482	216280368

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.8.0+cu128
Datasets 3.6.0
Tokenizers 0.21.1

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_qqp_1755694487

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2393)

this model