bbfc8e75dac0d6ea1ca69ac37ff5beaa

This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B on the nyu-mll/glue dataset. It achieves the following results on the evaluation set:

Loss: 5.0891
Data Size: 1.0
Epoch Runtime: 33.8007
Accuracy: 0.8149
F1 Macro: 0.7728
Rouge1: 0.8149
Rouge2: 0.0
Rougel: 0.8149
Rougelsum: 0.8154

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Accuracy	F1 Macro	Rouge1	Rougel	Rougelsum
No log	0	0	7.0905	0	4.4920	0.3544	0.2989	0.3538	0.3538	0.3544
No log	1	114	7.6449	0.0078	4.3656	0.6639	0.4121	0.6645	0.6639	0.6639
No log	2	228	3.1042	0.0156	7.0459	0.5731	0.5525	0.5725	0.5728	0.5731
No log	3	342	2.7720	0.0312	10.1902	0.6769	0.5282	0.6769	0.6763	0.6763
0.1341	4	456	2.3659	0.0625	12.0846	0.6881	0.6353	0.6881	0.6881	0.6881
0.1341	5	570	2.1620	0.125	14.6870	0.7488	0.6646	0.7482	0.7488	0.7494
0.1341	6	684	1.7724	0.25	19.3808	0.8013	0.7475	0.8019	0.8019	0.8013
0.4926	7	798	1.5741	0.5	21.4771	0.8261	0.8001	0.8255	0.8261	0.8261
0.9023	8.0	912	1.7902	1.0	32.0793	0.8090	0.7649	0.8093	0.8090	0.8096
0.5336	9.0	1026	3.7415	1.0	32.3208	0.7995	0.7667	0.7989	0.8001	0.7995
0.4963	10.0	1140	4.2174	1.0	31.7095	0.8160	0.7719	0.8160	0.8166	0.8166
0.1717	11.0	1254	5.0891	1.0	33.8007	0.8149	0.7728	0.8149	0.8149	0.8154

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.0.0
Tokenizers 0.22.1

Downloads last month: 4

Safetensors

Model size

0.4B params

Tensor type

F32

Model tree for contemmcm/bbfc8e75dac0d6ea1ca69ac37ff5beaa

Base model

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

Finetuned

(642)

this model