See axolotl config

axolotl version: 0.10.0.dev0

base_model: Qwen/Qwen3-14B

load_in_8bit: false
load_in_4bit: false
strict: false

datasets:
  - path: nl2json_main_dataset_n2527.jsonl
    type: alpaca

  - path: 24_game_filtered.jsonl
    type: alpaca

  - path: blocksworld_filtered.jsonl
    type: alpaca

test_datasets:
  - path: validation.jsonl
    ds_type: json
    # You need to specify a split. For "json" datasets the default split is called "train".
    split: train
    type: alpaca
    data_files:
      - /workspace/axolotl/examples/Qwen3/validation.jsonl

special_tokens:

dataset_prepared_path:
val_set_size: 0
output_dir: /workspace/axolotl/examples/Qwen3/outputs/

sequence_len: 2048
sample_packing: false
pad_to_sequence_len: true

adapter: lora
lora_model_dir:
lora_r: 8
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:
lora_target_modules:
  - gate_proj
  - down_proj
  - up_proj
  - q_proj
  - v_proj
  - k_proj
  - o_proj

wandb_project: dywoo_axolotl
wandb_entity: dywoo
wandb_watch:
wandb_run_id: 
wandb_log_model:

gradient_accumulation_steps: 2
micro_batch_size: 1
num_epochs: 3
optimizer: paged_adamw_32bit
lr_scheduler: cosine
learning_rate: 0.00005
train_on_inputs:
group_by_length: false
bf16: true
fp16: false
tf32: false
gradient_checkpointing: true
logging_steps: 20
xformers_attention:
flash_attention: true
warmup_ratio: 0.01
eval_steps: 100
save_steps: 100
save_total_limit: 2
eval_sample_packing:
debug:
deepspeed:
weight_decay: 0.01
fsdp:
fsdp_config:
save_safetensors: true
trust_remote_code: true

workspace/axolotl/examples/Qwen3/outputs/

This model is a fine-tuned version of Qwen/Qwen3-14B on the nl2json_main_dataset_n2527.jsonl, the 24_game_filtered.jsonl and the blocksworld_filtered.jsonl datasets. It achieves the following results on the evaluation set:

Loss: 0.0770

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 2
optimizer: Use OptimizerNames.PAGED_ADAMW with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 65
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss
No log	0.0005	1	0.5617
0.2266	0.0456	100	0.1534
0.152	0.0911	200	0.1350
0.0877	0.1367	300	0.1187
0.1362	0.1822	400	0.1010
0.1388	0.2278	500	0.1009
0.1244	0.2733	600	0.0917
0.1416	0.3189	700	0.0822
0.1028	0.3645	800	0.0897
0.0818	0.4100	900	0.1017
0.0859	0.4556	1000	0.0913
0.1335	0.5011	1100	0.0817
0.1191	0.5467	1200	0.0759
0.0613	0.5923	1300	0.0882
0.1186	0.6378	1400	0.0896
0.1023	0.6834	1500	0.0852
0.0733	0.7289	1600	0.0762
0.0663	0.7745	1700	0.0777
0.1212	0.8200	1800	0.0870
0.097	0.8656	1900	0.0871
0.1453	0.9112	2000	0.0793
0.1384	0.9567	2100	0.0762
0.0929	1.0023	2200	0.0817
0.043	1.0478	2300	0.0834
0.0668	1.0934	2400	0.0933
0.1073	1.1390	2500	0.0813
0.1035	1.1845	2600	0.0802
0.0592	1.2301	2700	0.0868
0.0849	1.2756	2800	0.0695
0.0585	1.3212	2900	0.0695
0.1156	1.3667	3000	0.0773
0.1327	1.4123	3100	0.0781
0.0901	1.4579	3200	0.0804
0.0984	1.5034	3300	0.0571
0.089	1.5490	3400	0.0652
0.0754	1.5945	3500	0.0721
0.0588	1.6401	3600	0.0715
0.0973	1.6856	3700	0.0714
0.0633	1.7312	3800	0.0667
0.1497	1.7768	3900	0.0584
0.0915	1.8223	4000	0.0643
0.0947	1.8679	4100	0.0625
0.0967	1.9134	4200	0.0683
0.104	1.9590	4300	0.0708
0.0585	2.0046	4400	0.0716
0.1116	2.0501	4500	0.0694
0.0763	2.0957	4600	0.0724
0.0646	2.1412	4700	0.0759
0.0852	2.1868	4800	0.0794
0.0952	2.2323	4900	0.0754
0.0646	2.2779	5000	0.0687
0.0844	2.3235	5100	0.0695
0.0775	2.3690	5200	0.0706
0.0775	2.4146	5300	0.0719
0.1177	2.4601	5400	0.0740
0.0594	2.5057	5500	0.0740
0.1008	2.5513	5600	0.0752
0.0753	2.5968	5700	0.0760
0.0649	2.6424	5800	0.0765
0.066	2.6879	5900	0.0764
0.1033	2.7335	6000	0.0767
0.0625	2.7790	6100	0.0766
0.0693	2.8246	6200	0.0768
0.0947	2.8702	6300	0.0772
0.0603	2.9157	6400	0.0771
0.0544	2.9613	6500	0.0770

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.6.0+cu124
Datasets 3.5.1
Tokenizers 0.21.1

Downloads last month: -

Safetensors

Model size

15B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dywoo/Qwen3-14B-V2-Plan

Base model

Qwen/Qwen3-14B-Base

Finetuned

Qwen/Qwen3-14B

Adapter

(165)

this model