Qwen3-4B Dual-Skill Agent (ALFWorld & DBBench) LoRA

This repository provides a Dual-Skill LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507. It is specifically optimized for two distinct agentic tasks: Household operations (ALFWorld) and Database interactions (DBBench).

Key Improvements & Features

Multi-task Generalization: Balanced training on both ALFWorld and DBBench, allowing the model to switch contexts based on the system prompt.
Optimized Trajectories: All training data was pre-cleaned to a maximum of 3072 tokens to ensure high-density learning without truncation of critical terminal actions.
Assistant-Only Loss: Fine-tuned using a specialized collator that applies loss only to assistant turns (THOUGHT/ACTION), preventing the model from memorizing environment descriptions.
Robustness: Includes error-recovery trajectories where the agent learns to correct its path after receiving "Nothing happened" or SQL errors from the environment.
Cleaner Reasoning: Removed potential "tools" format traps to align strictly with the AgentBench evaluation parser.
Data Shuffling: Integrated a mechanism to completely shuffle ALFWorld and DBBench data, eliminating bias during training and preventing catastrophic forgetting.

Training Configuration

Parameter	Value
Base model	Qwen/Qwen3-4B-Instruct-2507
Hardware	NVIDIA A100 SXM4 40GB
Precision	bfloat16
Max context length	3072 tokens
Epochs	2
Learning rate	2e-06
Batch size (effective)	8
LoRA Rank / Alpha	r=64 / a=128
Target Modules	All Linear Layers (Q,K,V,O,Gate,Up,Down)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "your_id/your-repo-name"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)

Training Data Sources

ALFWorld Cleaned: mark-22/alfworld_cleaned_for_agentbench - Focused on household task completion and navigation.
DBBench Cleaned: mark-22/dbbench_cleaned_for_agentbench - Focused on SQL generation and database manipulation (UPDATE/SELECT).

License

This adapter is distributed under the Apache-2.0 license. Please ensure compliance with the base model's usage terms.

Downloads last month: -

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for mark-22/qwen3-4b-agent-trajectory-lora

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

(5603)

this model

mark-22
/

qwen3-4b-agent-trajectory-lora