qwen3-4b-agent-trajectory-lora-test04
LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using Unsloth.
Key Improvement (vs test03)
ALFWorld training data (v3/v5) was in function-calling format, but the evaluation environment expects ReAct text format (THOUGHT/ACTION). This mismatch caused test03 to score significantly lower than test02.
test04 converts ALFWorld data to ReAct format before training:
assistant+tool_calls→THOUGHT: <think>\nACTION: <action>texttoolrole →userrole (matching eval environment observation format)
Training Configuration
- Base model: Qwen/Qwen3-4B-Instruct-2507
- Data: ALFWorld (v3+v5, ReAct-converted) + DBBench v4 × 3
- Max sequence length: 2048
- Epochs: 2
- Learning rate: 2e-06
- LoRA: r=64, alpha=128
Sources
- u-10bei/sft_alfworld_trajectory_dataset_v3
- u-10bei/sft_alfworld_trajectory_dataset_v5
- u-10bei/dbbench_sft_dataset_react_v4
- Downloads last month
- -
Model tree for peanuts33/advanced-test04
Base model
Qwen/Qwen3-4B-Instruct-2507