qwen2_5_sft_lora

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the mental_train_zh and the mental_train_en datasets. It achieves the following results on the evaluation set:

  • Loss: 0.7574

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 8
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 64
  • total_eval_batch_size: 64
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.05
  • num_epochs: 5.0

Training results

Training Loss Epoch Step Validation Loss
3.1493 0.2326 30 2.9270
1.6694 0.4651 60 1.6363
1.0719 0.6977 90 1.1152
0.9684 0.9302 120 0.9354
0.8318 1.1628 150 0.8653
0.8913 1.3953 180 0.8304
0.7614 1.6279 210 0.8097
0.8414 1.8605 240 0.7961
0.7691 2.0930 270 0.7862
0.7123 2.3256 300 0.7794
0.8115 2.5581 330 0.7741
0.7036 2.7907 360 0.7698
0.7824 3.0233 390 0.7660
0.7175 3.2558 420 0.7639
0.755 3.4884 450 0.7611
0.7646 3.7209 480 0.7598
0.7197 3.9535 510 0.7587
0.7992 4.1860 540 0.7580
0.6624 4.4186 570 0.7575
0.7216 4.6512 600 0.7573
0.7233 4.8837 630 0.7573

Framework versions

  • PEFT 0.12.0
  • Transformers 4.49.0
  • Pytorch 2.6.0+cu124
  • Datasets 3.3.2
  • Tokenizers 0.21.0
Downloads last month
-
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Tonikroos/Qwen_2.5_7B_lora_sft_Practice

Base model

Qwen/Qwen2.5-7B
Adapter
(1057)
this model