2d27387e8a3ceddc5fc7e93508f2c395

This model is a fine-tuned version of google-bert/bert-large-cased-whole-word-masking-finetuned-squad on the nyu-mll/glue [qqp] dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Accuracy	F1 Macro	Rouge1	Rougel	Rougelsum
No log	0	0	0.6974	0	31.5363	0.6320	0.3872	0.6318	0.6319	0.6317
0.5654	1	11370	0.4451	0.0078	41.4103	0.7935	0.7839	0.7934	0.7935	0.7936
0.4531	2	22740	0.3932	0.0156	47.9747	0.8157	0.7929	0.8156	0.8157	0.8157
0.4384	3	34110	0.4163	0.0312	65.4212	0.8078	0.8033	0.8077	0.8077	0.8077
0.6736	4	45480	0.6581	0.0625	98.0096	0.6320	0.3872	0.6318	0.6319	0.6317
0.6749	5	56850	0.6581	0.125	162.4325	0.6320	0.3872	0.6318	0.6319	0.6317
0.6615	6	68220	0.6577	0.25	290.4668	0.6320	0.3872	0.6318	0.6319	0.6317

Safetensors

Model size

0.3B params

Tensor type

F32

Base model

Finetuned

(18)

this model