4c7fe2f9b288a00f5df2661ee2c98dc7

This model is a fine-tuned version of google-bert/bert-large-cased on the nyu-mll/glue [stsb] dataset. It achieves the following results on the evaluation set:

Loss: 0.5591
Data Size: 1.0
Epoch Runtime: 19.7743
Mse: 0.5595
Mae: 0.5632
R2: 0.7497

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Mse	Mae	R2
No log	0	0	5.7652	0	1.6851	5.7664	1.9845	-1.5795
No log	1	179	4.5126	0.0078	2.1491	4.5137	1.7637	-1.0191
No log	2	358	3.0215	0.0156	2.4542	3.0225	1.4701	-0.3521
No log	3	537	2.2069	0.0312	2.8963	2.2076	1.2675	0.0124
No log	4	716	2.0335	0.0625	3.9247	2.0341	1.1986	0.0901
No log	5	895	1.4474	0.125	4.9995	1.4477	0.9734	0.3524
0.127	6	1074	1.0170	0.25	8.2968	1.0174	0.7717	0.5449
0.793	7	1253	0.6206	0.5	11.6572	0.6210	0.6211	0.7222
0.649	8.0	1432	0.6377	1.0	19.8813	0.6379	0.6257	0.7146
0.4499	9.0	1611	0.6489	1.0	20.0301	0.6493	0.5990	0.7095
0.3274	10.0	1790	0.5890	1.0	18.7848	0.5894	0.5793	0.7363
0.2746	11.0	1969	0.5887	1.0	19.6493	0.5890	0.5872	0.7365
0.2181	12.0	2148	0.5465	1.0	19.1146	0.5467	0.5608	0.7554
0.1726	13.0	2327	0.5697	1.0	18.9297	0.5698	0.5747	0.7451
0.1574	14.0	2506	0.5282	1.0	19.6140	0.5284	0.5528	0.7636
0.1326	15.0	2685	0.5236	1.0	19.5125	0.5240	0.5489	0.7656
0.1858	16.0	2864	0.5252	1.0	19.5494	0.5254	0.5471	0.7650
0.1281	17.0	3043	0.5447	1.0	19.6684	0.5451	0.5569	0.7562
0.113	18.0	3222	0.5457	1.0	19.2523	0.5461	0.5631	0.7557
0.1334	19.0	3401	0.5591	1.0	19.7743	0.5595	0.5632	0.7497

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.3.0
Tokenizers 0.22.1

Downloads last month: 5

Safetensors

Model size

0.3B params

Tensor type

F32

Model tree for contemmcm/4c7fe2f9b288a00f5df2661ee2c98dc7

Base model

google-bert/bert-large-cased

Finetuned

(148)

this model