Fine-tuned model (merged)
- Base model:
Qwen/Qwen3-4B-Instruct-2507 - Dataset:
u-10bei/sft_alfworld_trajectory_dataset_v5 - Method: SFT (assistant-only loss)
- Format: Merged full model
Training config (key)
- max_seq_len: 1024
- epochs: 1
- per_device_train_bs: 1
- grad_accum: 4
- lr: 2e-06
- warmup_ratio: 0.1
- weight_decay: 0.05
- lora_r: 64
- lora_alpha: 128
- lora_dropout: 0.0
- target_modules: q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj
Generated by
train_sft.py.
- Downloads last month
- -
Model tree for oretti/your-repo-merged
Base model
Qwen/Qwen3-4B-Instruct-2507