Qwen3-4B-Base-hatebr-ep50

This model is a fine-tuned version of Qwen/Qwen3-4B-Base on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 50

Training Loss	Epoch	Step	Validation Loss
0.5718	1.0	1120	0.5194
0.455	2.0	2240	0.5235
0.3395	3.0	3360	0.5680
0.2534	4.0	4480	0.6238
0.1885	5.0	5600	0.6882
0.1357	6.0	6720	0.7455
0.103	7.0	7840	0.7946
0.0943	8.0	8960	0.8471
0.0868	9.0	10080	0.8599
0.0789	10.0	11200	0.8890
0.0805	11.0	12320	0.9269
0.0774	12.0	13440	0.9286
0.0778	13.0	14560	0.9431
0.0762	14.0	15680	0.9620
0.0739	15.0	16800	0.9621
0.0743	16.0	17920	0.9743
0.0738	17.0	19040	0.9786
0.0731	18.0	20160	0.9828
0.0743	19.0	21280	0.9841
0.0721	20.0	22400	1.0077
0.0716	21.0	23520	0.9939
0.071	22.0	24640	1.0024
0.0709	23.0	25760	0.9973
0.0718	24.0	26880	0.9963
0.0703	25.0	28000	1.0101
0.0715	26.0	29120	1.0186
0.0687	27.0	30240	1.0106
0.0699	28.0	31360	1.0166
0.0685	29.0	32480	1.0141
0.0679	30.0	33600	1.0195
0.069	31.0	34720	1.0237
0.0682	32.0	35840	1.0196
0.0693	33.0	36960	1.0290
0.068	34.0	38080	1.0268
0.0698	35.0	39200	1.0320
0.0682	36.0	40320	1.0343
0.0669	37.0	41440	1.0357
0.0693	38.0	42560	1.0396
0.0676	39.0	43680	1.0409
0.0666	40.0	44800	1.0403
0.0676	41.0	45920	1.0420
0.0665	42.0	47040	1.0450
0.0677	43.0	48160	1.0457
0.0675	44.0	49280	1.0455
0.0681	45.0	50400	1.0466
0.0682	46.0	51520	1.0471
0.0658	47.0	52640	1.0481
0.0659	48.0	53760	1.0485
0.0661	49.0	54880	1.0482
0.0668	50.0	56000	1.0482

Safetensors

Model size

4B params

Tensor type

BF16

Base model

Finetuned

(267)

this model