9b1edb80d486254ba50ce5e74026be52

This model is a fine-tuned version of Qwen/Qwen2.5-3B on the nyu-mll/glue [stsb] dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Mse	Mae	R2
No log	0	0	70.3202	0	4.9121	17.5819	3.7099	-6.8650
No log	1	179	318.1114	0.0078	5.2902	79.5276	8.3275	-34.5756
No log	2	358	25.7356	0.0156	11.4320	6.4342	2.0778	-1.8783
No log	3	537	8.8930	0.0312	17.1693	2.2241	1.2720	0.0051
No log	4	716	10.2854	0.0625	23.2553	2.5720	1.1787	-0.1505
No log	5	895	7.6714	0.125	33.7383	1.9183	1.1024	0.1419
1.908	6	1074	14.2409	0.25	43.7448	3.5610	1.1933	-0.5930
3.7605	7	1253	4.5245	0.5	45.1382	1.1317	0.8626	0.4938
3.4892	8.0	1432	3.4605	1.0	67.8121	0.8653	0.7061	0.6129
2.7979	9.0	1611	3.1841	1.0	60.9732	0.7963	0.7145	0.6438
2.6538	10.0	1790	5.1504	1.0	64.9600	1.2876	0.7095	0.4240
3.2084	11.0	1969	3.6981	1.0	65.1590	0.9247	0.7000	0.5864
1.5833	12.0	2148	3.4242	1.0	62.1172	0.8565	0.7233	0.6169
1.6063	13.0	2327	3.2559	1.0	62.3053	0.8143	0.7144	0.6357

Safetensors

Model size

0.8B params

Tensor type

F32

Base model

Finetuned

(369)

this model