PEFT
Safetensors
qwen3
Generated from Trainer

Built with Axolotl

See axolotl config

axolotl version: 0.10.0.dev0

base_model: Qwen/Qwen3-14B

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: nl2json_main_dataset_n2527.jsonl
    type: alpaca

  - path: 24_game_filtered.jsonl
    type: alpaca

  - path: blocksworld_filtered.jsonl
    type: alpaca

test_datasets:
  - path: validation.jsonl
    ds_type: json
    # You need to specify a split. For "json" datasets the default split is called "train".
    split: train
    type: alpaca
    data_files:
      - /workspace/axolotl/examples/Qwen3/validation.jsonl

special_tokens:

dataset_prepared_path:
val_set_size: 0
output_dir: /workspace/axolotl/examples/Qwen3/outputs/

sequence_len: 2048
sample_packing: false
pad_to_sequence_len: true

adapter: lora
lora_model_dir:
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:
  - gate_proj
  - down_proj
  - up_proj
  - q_proj
  - v_proj
  - k_proj
  - o_proj

wandb_project: dywoo_axolotl
wandb_entity: dywoo
wandb_watch:
wandb_run_id: 
wandb_log_model:

gradient_accumulation_steps: 2
micro_batch_size: 1
num_epochs: 3
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 0.00005
train_on_inputs:
group_by_length: false
bf16: true
fp16: false
tf32: false
gradient_checkpointing: true
logging_steps: 20
xformers_attention:
flash_attention: true
warmup_ratio: 0.01
eval_steps: 100
save_steps: 100
save_total_limit: 2
eval_sample_packing:
debug:
deepspeed:
weight_decay: 0.01
fsdp:
fsdp_config:
save_safetensors: true
trust_remote_code: true

workspace/axolotl/examples/Qwen3/outputs/

This model is a fine-tuned version of Qwen/Qwen3-14B on the nl2json_main_dataset_n2527.jsonl, the 24_game_filtered.jsonl and the blocksworld_filtered.jsonl datasets. It achieves the following results on the evaluation set:

  • Loss: 0.0770

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 1
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 2
  • optimizer: Use OptimizerNames.PAGED_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 65
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss
No log 0.0005 1 0.5617
0.2266 0.0456 100 0.1534
0.152 0.0911 200 0.1350
0.0877 0.1367 300 0.1187
0.1362 0.1822 400 0.1010
0.1388 0.2278 500 0.1009
0.1244 0.2733 600 0.0917
0.1416 0.3189 700 0.0822
0.1028 0.3645 800 0.0897
0.0818 0.4100 900 0.1017
0.0859 0.4556 1000 0.0913
0.1335 0.5011 1100 0.0817
0.1191 0.5467 1200 0.0759
0.0613 0.5923 1300 0.0882
0.1186 0.6378 1400 0.0896
0.1023 0.6834 1500 0.0852
0.0733 0.7289 1600 0.0762
0.0663 0.7745 1700 0.0777
0.1212 0.8200 1800 0.0870
0.097 0.8656 1900 0.0871
0.1453 0.9112 2000 0.0793
0.1384 0.9567 2100 0.0762
0.0929 1.0023 2200 0.0817
0.043 1.0478 2300 0.0834
0.0668 1.0934 2400 0.0933
0.1073 1.1390 2500 0.0813
0.1035 1.1845 2600 0.0802
0.0592 1.2301 2700 0.0868
0.0849 1.2756 2800 0.0695
0.0585 1.3212 2900 0.0695
0.1156 1.3667 3000 0.0773
0.1327 1.4123 3100 0.0781
0.0901 1.4579 3200 0.0804
0.0984 1.5034 3300 0.0571
0.089 1.5490 3400 0.0652
0.0754 1.5945 3500 0.0721
0.0588 1.6401 3600 0.0715
0.0973 1.6856 3700 0.0714
0.0633 1.7312 3800 0.0667
0.1497 1.7768 3900 0.0584
0.0915 1.8223 4000 0.0643
0.0947 1.8679 4100 0.0625
0.0967 1.9134 4200 0.0683
0.104 1.9590 4300 0.0708
0.0585 2.0046 4400 0.0716
0.1116 2.0501 4500 0.0694
0.0763 2.0957 4600 0.0724
0.0646 2.1412 4700 0.0759
0.0852 2.1868 4800 0.0794
0.0952 2.2323 4900 0.0754
0.0646 2.2779 5000 0.0687
0.0844 2.3235 5100 0.0695
0.0775 2.3690 5200 0.0706
0.0775 2.4146 5300 0.0719
0.1177 2.4601 5400 0.0740
0.0594 2.5057 5500 0.0740
0.1008 2.5513 5600 0.0752
0.0753 2.5968 5700 0.0760
0.0649 2.6424 5800 0.0765
0.066 2.6879 5900 0.0764
0.1033 2.7335 6000 0.0767
0.0625 2.7790 6100 0.0766
0.0693 2.8246 6200 0.0768
0.0947 2.8702 6300 0.0772
0.0603 2.9157 6400 0.0771
0.0544 2.9613 6500 0.0770

Framework versions

  • PEFT 0.15.2
  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.1
  • Tokenizers 0.21.1
Downloads last month
-
Safetensors
Model size
15B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dywoo/Qwen3-14B-V2-Plan

Finetuned
Qwen/Qwen3-14B
Adapter
(158)
this model