9bfecc547b76d8fecaa6c32be2e54343

This model is a fine-tuned version of google/gemma-2b on the nyu-mll/glue [qnli] dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Accuracy	F1 Macro	Rouge1	Rougel	Rougelsum
No log	0	0	3.6654	0	8.5171	0.5121	0.4545	0.5119	0.5121	0.5125
No log	1	3273	2.1014	0.0078	12.8466	0.7307	0.7224	0.7306	0.7311	0.7305
0.0499	2	6546	2.2790	0.0156	19.0899	0.7232	0.7094	0.7232	0.7237	0.7232
2.2861	3	9819	2.2956	0.0312	30.4848	0.7384	0.7273	0.7388	0.7388	0.7382
3.2397	4	13092	3.9930	0.0625	51.7418	0.5057	0.3359	0.5053	0.5057	0.5056
2.7864	5	16365	2.7438	0.125	89.7536	0.5557	0.5555	0.5561	0.5559	0.5557

Safetensors

Model size

0.6B params

Tensor type

F32

Base model

Finetuned

(286)

this model