39b22d3c370957e7005b3e7b6fdeafe0

This model is a fine-tuned version of distilbert/distilbert-base-cased on the nyu-mll/glue [sst2] dataset. It achieves the following results on the evaluation set:

Loss: 0.3027
Data Size: 1.0
Epoch Runtime: 55.4605
Accuracy: 0.8889
F1 Macro: 0.8889
Rouge1: 0.8877
Rouge2: 0.0
Rougel: 0.8889
Rougelsum: 0.8889

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Accuracy	F1 Macro	Rouge1	Rougel	Rougelsum
No log	0	0	0.6998	0	0.8655	0.4907	0.3292	0.4907	0.4907	0.4919
No log	1	2104	0.5894	0.0078	2.3393	0.7257	0.7134	0.7257	0.7257	0.7257
No log	2	4208	0.4848	0.0156	1.8726	0.7940	0.7884	0.7946	0.7940	0.7940
0.0095	3	6312	0.3358	0.0312	2.7320	0.8495	0.8493	0.8495	0.8495	0.8495
0.3413	4	8416	0.2944	0.0625	4.3091	0.8808	0.8805	0.8808	0.8808	0.8808
0.27	5	10520	0.3139	0.125	7.7197	0.8808	0.8807	0.8808	0.8819	0.8808
0.1999	6	12624	0.3231	0.25	14.6671	0.8762	0.8756	0.8773	0.8762	0.8762
0.199	7	14728	0.3575	0.5	28.0818	0.8796	0.8786	0.8796	0.8796	0.8796
0.1481	8.0	16832	0.3027	1.0	55.4605	0.8889	0.8889	0.8877	0.8889	0.8889

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.3.0
Tokenizers 0.22.1

Downloads last month: -

Safetensors

Model size

65.8M params

Tensor type

F32

Model tree for contemmcm/39b22d3c370957e7005b3e7b6fdeafe0

Base model

distilbert/distilbert-base-cased

Finetuned

(336)

this model