peanuts33
/

advanced-test04

Text Generation

Model card Files Files and versions

advanced-test04 / README.md

peanuts33's picture

test04: ALFWorld ReAct-converted mixed SFT

37124c5 verified 1 day ago

|

history blame contribute delete

1.25 kB

	---
	base_model: Qwen/Qwen3-4B-Instruct-2507
	datasets:
	- u-10bei/sft_alfworld_trajectory_dataset_v3
	- u-10bei/sft_alfworld_trajectory_dataset_v5
	- u-10bei/dbbench_sft_dataset_react_v4
	language:
	- en
	license: apache-2.0
	library_name: peft
	pipeline_tag: text-generation
	tags:
	- lora
	- agent
	- alfworld
	- dbbench
	---

	# qwen3-4b-agent-trajectory-lora-test04

	LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using Unsloth.

	## Key Improvement (vs test03)

	ALFWorld training data (v3/v5) was in function-calling format, but the evaluation
	environment expects ReAct text format (THOUGHT/ACTION). This mismatch caused test03
	to score significantly lower than test02.

	test04 converts ALFWorld data to ReAct format before training:
	- `assistant` + `tool_calls` → `THOUGHT: <think>\nACTION: <action>` text
	- `tool` role → `user` role (matching eval environment observation format)

	## Training Configuration

	- Base model: Qwen/Qwen3-4B-Instruct-2507
	- Data: ALFWorld (v3+v5, ReAct-converted) + DBBench v4 × 3
	- Max sequence length: 2048
	- Epochs: 2
	- Learning rate: 2e-06
	- LoRA: r=64, alpha=128

	## Sources

	- u-10bei/sft_alfworld_trajectory_dataset_v3
	- u-10bei/sft_alfworld_trajectory_dataset_v5
	- u-10bei/dbbench_sft_dataset_react_v4