train_wsc_789_1760360866

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

Loss: 0.3413
Num Input Tokens Seen: 1462816

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 30

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3493	1.504	188	0.3545	73440
0.3662	3.008	376	0.3573	147296
0.3506	4.5120	564	0.3427	221744
0.353	6.016	752	0.3442	293760
0.3303	7.52	940	0.3436	367808
0.3423	9.024	1128	0.3454	440288
0.3528	10.528	1316	0.3445	512864
0.3418	12.032	1504	0.3475	587120
0.3443	13.536	1692	0.3413	660352
0.3507	15.04	1880	0.3509	733072
0.3418	16.544	2068	0.3469	805824
0.3524	18.048	2256	0.3445	880160
0.3437	19.552	2444	0.3452	955488
0.3503	21.056	2632	0.3447	1028544
0.3525	22.56	2820	0.3506	1101536
0.353	24.064	3008	0.3454	1174032
0.3437	25.568	3196	0.3466	1247296
0.3593	27.072	3384	0.3474	1320640
0.3436	28.576	3572	0.3482	1394256

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.8.0+cu128
Datasets 3.6.0
Tokenizers 0.21.1

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_789_1760360866

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2397)

this model