708e64097800ca5058178d2d8221bd2c

This model is a fine-tuned version of distilbert/distilbert-base-cased on the nyu-mll/glue [qqp] dataset. It achieves the following results on the evaluation set:

Loss: 0.4029
Data Size: 1.0
Epoch Runtime: 314.5879
Accuracy: 0.8854
F1 Macro: 0.8791
Rouge1: 0.8854
Rouge2: 0.0
Rougel: 0.8854
Rougelsum: 0.8854

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Accuracy	F1 Macro	Rouge1	Rougel	Rougelsum
No log	0	0	0.6705	0	10.9249	0.6320	0.3872	0.6318	0.6319	0.6317
0.6127	1	11370	0.5143	0.0078	13.7585	0.7461	0.7305	0.7462	0.7461	0.7460
0.4709	2	22740	0.4546	0.0156	15.7784	0.7755	0.7645	0.7756	0.7755	0.7755
0.4248	3	34110	0.4101	0.0312	20.6107	0.8050	0.7923	0.8051	0.8048	0.8049
0.3922	4	45480	0.3822	0.0625	29.3199	0.8266	0.8162	0.8266	0.8266	0.8266
0.3479	5	56850	0.3522	0.125	47.4099	0.8371	0.8308	0.8371	0.8371	0.8370
0.3028	6	68220	0.3322	0.25	84.9991	0.8519	0.8452	0.8519	0.8519	0.8518
0.2895	7	79590	0.2934	0.5	159.0736	0.8693	0.8628	0.8695	0.8694	0.8692
0.256	8.0	90960	0.2808	1.0	305.0913	0.8842	0.8766	0.8843	0.8842	0.8843
0.2016	9.0	102330	0.2957	1.0	313.2320	0.8861	0.8787	0.8861	0.8861	0.8860
0.161	10.0	113700	0.3197	1.0	311.6922	0.8908	0.8827	0.8908	0.8909	0.8908
0.1512	11.0	125070	0.3346	1.0	311.9577	0.8771	0.8711	0.8772	0.8772	0.8772
0.1172	12.0	136440	0.4029	1.0	314.5879	0.8854	0.8791	0.8854	0.8854	0.8854

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.3.0
Tokenizers 0.22.1

Downloads last month: -

Safetensors

Model size

65.8M params

Tensor type

F32

Model tree for contemmcm/708e64097800ca5058178d2d8221bd2c

Base model

distilbert/distilbert-base-cased

Finetuned

(336)

this model