qwen2_5_sft_lora

This model is a fine-tuned version of Qwen/Qwen2.5-7B-Instruct on the mental_train_zh and the mental_train_en datasets. It achieves the following results on the evaluation set:

Loss: 0.7574

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 8
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 64
optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.05
num_epochs: 5.0

Training results

Training Loss	Epoch	Step	Validation Loss
3.1493	0.2326	30	2.9270
1.6694	0.4651	60	1.6363
1.0719	0.6977	90	1.1152
0.9684	0.9302	120	0.9354
0.8318	1.1628	150	0.8653
0.8913	1.3953	180	0.8304
0.7614	1.6279	210	0.8097
0.8414	1.8605	240	0.7961
0.7691	2.0930	270	0.7862
0.7123	2.3256	300	0.7794
0.8115	2.5581	330	0.7741
0.7036	2.7907	360	0.7698
0.7824	3.0233	390	0.7660
0.7175	3.2558	420	0.7639
0.755	3.4884	450	0.7611
0.7646	3.7209	480	0.7598
0.7197	3.9535	510	0.7587
0.7992	4.1860	540	0.7580
0.6624	4.4186	570	0.7575
0.7216	4.6512	600	0.7573
0.7233	4.8837	630	0.7573

Framework versions

PEFT 0.12.0
Transformers 4.49.0
Pytorch 2.6.0+cu124
Datasets 3.3.2
Tokenizers 0.21.0

Downloads last month: -

Safetensors

Model size

8B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Tonikroos/Qwen_2.5_7B_lora_sft_Practice

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Adapter

(2134)

this model