qwen3-0.6b-telecom-distilled

This model is a fine-tuned version of Qwen/Qwen3-0.6B on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 10.2718

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0002
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 16
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss
27.4065 0.1778 100 10.8046
27.4681 0.3556 200 10.2056
26.4315 0.5333 300 10.1465
26.6375 0.7111 400 10.2622
25.2635 0.8889 500 10.1981
26.9654 1.0658 600 10.1700
25.3722 1.2436 700 10.0219
26.5817 1.4213 800 10.2433
24.4284 1.5991 900 10.2607
25.5034 1.7769 1000 10.2287
25.934 1.9547 1100 10.2109
25.3019 2.1316 1200 10.1546
24.4273 2.3093 1300 10.1781
24.9056 2.4871 1400 10.2320
24.2637 2.6649 1500 10.2247
24.5714 2.8427 1600 10.2718

Framework versions

  • PEFT 0.18.0
  • Transformers 4.57.3
  • Pytorch 2.9.1+cu128
  • Datasets 4.4.1
  • Tokenizers 0.22.1
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for wlabchoi/qwen3-0.6b-telecom-distilled

Finetuned
Qwen/Qwen3-0.6B
Adapter
(375)
this model