Text Generation
PEFT
Safetensors
English
qwen3
lora
agent
alfworld
dbbench
conversational
File size: 1,247 Bytes
37124c5
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
---
base_model: Qwen/Qwen3-4B-Instruct-2507
datasets:
- u-10bei/sft_alfworld_trajectory_dataset_v3
- u-10bei/sft_alfworld_trajectory_dataset_v5
- u-10bei/dbbench_sft_dataset_react_v4
language:
- en
license: apache-2.0
library_name: peft
pipeline_tag: text-generation
tags:
- lora
- agent
- alfworld
- dbbench
---

# qwen3-4b-agent-trajectory-lora-test04

LoRA adapter fine-tuned from **Qwen/Qwen3-4B-Instruct-2507** using Unsloth.

## Key Improvement (vs test03)

ALFWorld training data (v3/v5) was in function-calling format, but the evaluation
environment expects ReAct text format (THOUGHT/ACTION). This mismatch caused test03
to score significantly lower than test02.

test04 converts ALFWorld data to ReAct format before training:
- `assistant` + `tool_calls``THOUGHT: <think>\nACTION: <action>` text
- `tool` role → `user` role (matching eval environment observation format)

## Training Configuration

- Base model: Qwen/Qwen3-4B-Instruct-2507
- Data: ALFWorld (v3+v5, ReAct-converted) + DBBench v4 × 3
- Max sequence length: 2048
- Epochs: 2
- Learning rate: 2e-06
- LoRA: r=64, alpha=128

## Sources

- u-10bei/sft_alfworld_trajectory_dataset_v3
- u-10bei/sft_alfworld_trajectory_dataset_v5
- u-10bei/dbbench_sft_dataset_react_v4