--- base_model: Qwen/Qwen3-4B-Instruct-2507 datasets: - u-10bei/sft_alfworld_trajectory_dataset_v3 - u-10bei/sft_alfworld_trajectory_dataset_v5 - u-10bei/dbbench_sft_dataset_react_v4 language: - en license: apache-2.0 library_name: peft pipeline_tag: text-generation tags: - lora - agent - alfworld - dbbench --- # qwen3-4b-agent-trajectory-lora-test04 LoRA adapter fine-tuned from **Qwen/Qwen3-4B-Instruct-2507** using Unsloth. ## Key Improvement (vs test03) ALFWorld training data (v3/v5) was in function-calling format, but the evaluation environment expects ReAct text format (THOUGHT/ACTION). This mismatch caused test03 to score significantly lower than test02. test04 converts ALFWorld data to ReAct format before training: - `assistant` + `tool_calls` → `THOUGHT: \nACTION: ` text - `tool` role → `user` role (matching eval environment observation format) ## Training Configuration - Base model: Qwen/Qwen3-4B-Instruct-2507 - Data: ALFWorld (v3+v5, ReAct-converted) + DBBench v4 × 3 - Max sequence length: 2048 - Epochs: 2 - Learning rate: 2e-06 - LoRA: r=64, alpha=128 ## Sources - u-10bei/sft_alfworld_trajectory_dataset_v3 - u-10bei/sft_alfworld_trajectory_dataset_v5 - u-10bei/dbbench_sft_dataset_react_v4