| | --- |
| | base_model: Qwen/Qwen3-4B-Instruct-2507 |
| | datasets: |
| | - u-10bei/sft_alfworld_trajectory_dataset_v3 |
| | - u-10bei/sft_alfworld_trajectory_dataset_v5 |
| | - u-10bei/dbbench_sft_dataset_react_v4 |
| | language: |
| | - en |
| | license: apache-2.0 |
| | library_name: peft |
| | pipeline_tag: text-generation |
| | tags: |
| | - lora |
| | - agent |
| | - alfworld |
| | - dbbench |
| | --- |
| | |
| | # qwen3-4b-agent-trajectory-lora-test04 |
| |
|
| | LoRA adapter fine-tuned from **Qwen/Qwen3-4B-Instruct-2507** using Unsloth. |
| |
|
| | ## Key Improvement (vs test03) |
| |
|
| | ALFWorld training data (v3/v5) was in function-calling format, but the evaluation |
| | environment expects ReAct text format (THOUGHT/ACTION). This mismatch caused test03 |
| | to score significantly lower than test02. |
| |
|
| | test04 converts ALFWorld data to ReAct format before training: |
| | - `assistant` + `tool_calls` → `THOUGHT: <think>\nACTION: <action>` text |
| | - `tool` role → `user` role (matching eval environment observation format) |
| |
|
| | ## Training Configuration |
| |
|
| | - Base model: Qwen/Qwen3-4B-Instruct-2507 |
| | - Data: ALFWorld (v3+v5, ReAct-converted) + DBBench v4 × 3 |
| | - Max sequence length: 2048 |
| | - Epochs: 2 |
| | - Learning rate: 2e-06 |
| | - LoRA: r=64, alpha=128 |
| |
|
| | ## Sources |
| |
|
| | - u-10bei/sft_alfworld_trajectory_dataset_v3 |
| | - u-10bei/sft_alfworld_trajectory_dataset_v5 |
| | - u-10bei/dbbench_sft_dataset_react_v4 |
| |
|