6609d0bf53122d36d71855333b8a0b23

This model is a fine-tuned version of deepseek-ai/DeepSeek-R1-Distill-Llama-8B on the nyu-mll/glue [stsb] dataset. It achieves the following results on the evaluation set:

Loss: 3.9567
Data Size: 1.0
Epoch Runtime: 192.4977
Mse: 0.9895
Mae: 0.8095
R2: 0.5573

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 4
total_train_batch_size: 32
total_eval_batch_size: 32
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: constant
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Data Size	Epoch Runtime	Mse	Mae	R2
No log	0	0	115.6284	0	5.8382	28.9074	4.8139	-11.9313
No log	1	179	1602.5880	0.0078	6.7554	400.6450	17.8926	-178.2229
No log	2	358	8.9211	0.0156	12.3620	2.2311	1.2819	0.0020
No log	3	537	9.4214	0.0312	28.3318	2.3562	1.3232	-0.0540
No log	4	716	10.6515	0.0625	40.3602	2.6636	1.3415	-0.1915
No log	5	895	9.8013	0.125	59.3143	2.4510	1.3016	-0.0964
16.7373	6	1074	5.6460	0.25	93.2231	1.4121	0.9905	0.3683
5.7196	7	1253	4.3240	0.5	140.8782	1.0814	0.8347	0.5163
3.9842	8.0	1432	4.6427	1.0	226.9675	1.1611	0.8728	0.4806
2.7011	9.0	1611	4.6092	1.0	194.4204	1.1528	0.8548	0.4843
2.2374	10.0	1790	3.4777	1.0	197.2961	0.8699	0.7507	0.6109
1.6668	11.0	1969	4.9638	1.0	197.2742	1.2414	0.8617	0.4447
1.5425	12.0	2148	5.2001	1.0	190.3786	1.3006	0.9018	0.4182
1.3228	13.0	2327	5.1050	1.0	197.3229	1.2765	0.9036	0.4290
1.0798	14.0	2506	3.9567	1.0	192.4977	0.9895	0.8095	0.5573

Framework versions

Transformers 4.57.0
Pytorch 2.8.0+cu128
Datasets 4.0.0
Tokenizers 0.22.1

Downloads last month: 1

Safetensors

Model size

2B params

Tensor type

F32

Model tree for contemmcm/6609d0bf53122d36d71855333b8a0b23

Base model

deepseek-ai/DeepSeek-R1-Distill-Llama-8B

Finetuned

(155)

this model