qwen3-4b-agent-trajectory-lora-test04

LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using Unsloth.

Key Improvement (vs test03)

ALFWorld training data (v3/v5) was in function-calling format, but the evaluation environment expects ReAct text format (THOUGHT/ACTION). This mismatch caused test03 to score significantly lower than test02.

test04 converts ALFWorld data to ReAct format before training:

assistant + tool_calls → THOUGHT: <think>\nACTION: <action> text
tool role → user role (matching eval environment observation format)

Training Configuration

Base model: Qwen/Qwen3-4B-Instruct-2507
Data: ALFWorld (v3+v5, ReAct-converted) + DBBench v4 × 3
Max sequence length: 2048
Epochs: 2
Learning rate: 2e-06
LoRA: r=64, alpha=128

Sources

u-10bei/sft_alfworld_trajectory_dataset_v3
u-10bei/sft_alfworld_trajectory_dataset_v5
u-10bei/dbbench_sft_dataset_react_v4

Downloads last month: -

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for peanuts33/advanced-test04

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

(5598)

this model

peanuts33
/

advanced-test04

qwen3-4b-agent-trajectory-lora-test04

Key Improvement (vs test03)

Training Configuration

Sources

Model tree for peanuts33/advanced-test04

Datasets used to train peanuts33/advanced-test04