Text Generation
PEFT
Safetensors
English
qwen3
lora
agent
alfworld
dbbench
conversational
advanced-test04 / README.md
peanuts33's picture
test04: ALFWorld ReAct-converted mixed SFT
37124c5 verified
metadata
base_model: Qwen/Qwen3-4B-Instruct-2507
datasets:
  - u-10bei/sft_alfworld_trajectory_dataset_v3
  - u-10bei/sft_alfworld_trajectory_dataset_v5
  - u-10bei/dbbench_sft_dataset_react_v4
language:
  - en
license: apache-2.0
library_name: peft
pipeline_tag: text-generation
tags:
  - lora
  - agent
  - alfworld
  - dbbench

qwen3-4b-agent-trajectory-lora-test04

LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using Unsloth.

Key Improvement (vs test03)

ALFWorld training data (v3/v5) was in function-calling format, but the evaluation environment expects ReAct text format (THOUGHT/ACTION). This mismatch caused test03 to score significantly lower than test02.

test04 converts ALFWorld data to ReAct format before training:

  • assistant + tool_callsTHOUGHT: <think>\nACTION: <action> text
  • tool role → user role (matching eval environment observation format)

Training Configuration

  • Base model: Qwen/Qwen3-4B-Instruct-2507
  • Data: ALFWorld (v3+v5, ReAct-converted) + DBBench v4 × 3
  • Max sequence length: 2048
  • Epochs: 2
  • Learning rate: 2e-06
  • LoRA: r=64, alpha=128

Sources

  • u-10bei/sft_alfworld_trajectory_dataset_v3
  • u-10bei/sft_alfworld_trajectory_dataset_v5
  • u-10bei/dbbench_sft_dataset_react_v4