20b7e1ac90cf8c2e7d5aa3400bcf8acf

This model is a fine-tuned version of albert/albert-xxlarge-v1 on the nyu-mll/glue dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Accuracy	F1 Macro	Rouge1	Rougel	Rougelsum
No log	0	0	0.7388	0	0.6355	0.4375	0.3043	0.4375	0.4375	0.4375
No log	1	19	0.6920	0.0078	1.2543	0.5156	0.4284	0.5156	0.5156	0.5156
No log	2	38	0.6879	0.0156	0.7894	0.5312	0.3469	0.5312	0.5312	0.5312
No log	3	57	0.6934	0.0312	1.0514	0.5312	0.3469	0.5312	0.5312	0.5312
No log	4	76	0.7089	0.0625	1.1198	0.5312	0.4203	0.5312	0.5312	0.5312
No log	5	95	0.7070	0.125	1.1193	0.4531	0.4296	0.4531	0.4531	0.4531
0.0844	6	114	0.6899	0.25	1.2945	0.5	0.4995	0.5	0.5	0.5

Safetensors

Model size

0.2B params

Tensor type

F32

Base model

Finetuned

(19)

this model