qwen3-0.6b-telecom-distilled

This model is a fine-tuned version of Qwen/Qwen3-0.6B on an unknown dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 2
eval_batch_size: 2
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 16
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training Loss	Epoch	Step	Validation Loss
27.4065	0.1778	100	10.8046
27.4681	0.3556	200	10.2056
26.4315	0.5333	300	10.1465
26.6375	0.7111	400	10.2622
25.2635	0.8889	500	10.1981
26.9654	1.0658	600	10.1700
25.3722	1.2436	700	10.0219
26.5817	1.4213	800	10.2433
24.4284	1.5991	900	10.2607
25.5034	1.7769	1000	10.2287
25.934	1.9547	1100	10.2109
25.3019	2.1316	1200	10.1546
24.4273	2.3093	1300	10.1781
24.9056	2.4871	1400	10.2320
24.2637	2.6649	1500	10.2247
24.5714	2.8427	1600	10.2718

Base model

Finetuned

Adapter

(375)

this model