qwen3-4b-agent-trajectory-lora-mixed-alf07-db03
This repository provides a LoRA adapter fine-tuned from
Qwen/Qwen3-4B-Instruct-2507 using LoRA + Unsloth.
This repository contains LoRA adapter weights only.
The base model must be loaded separately.
Training Objective
This adapter is trained to improve multi-turn agent task performance
on two complementary task families:
1. ALFWorld (Embodied Household Tasks)
- Object manipulation
- Sequential action planning
- Observation → Action → Feedback loop
2. DBBench (Database Reasoning Tasks)
- SQL generation
- Schema exploration
- Tool use & error recovery
Dataset Composition
The training dataset is a mixed trajectory dataset constructed by sampling:
- ALFWorld : DBBench = 0.7 : 0.3
This mixture ratio is chosen to balance:
- high success rate in ALFWorld (action planning ability)
- high SQL accuracy in DBBench (symbolic reasoning ability)
This design empirically improves overall AgentBench performance by avoiding
over-specialization to either domain.
Training Configuration
- Base model: Qwen/Qwen3-4B-Instruct-2507
- Method: LoRA (full precision base model)
- Max sequence length: 2048
- Epochs: 2
- Learning rate: 2e-6
- Warmup ratio: 0.05
- Weight decay: 0.05
LoRA configuration
- Rank (r): 96
- Alpha: 128
- Dropout: 0.06
- Target modules:
- q_proj
- k_proj
- v_proj
- o_proj
- gate_proj
- up_proj
- down_proj
Dataset mixture
- ALFWorld 70%
- DBBench 30%
Training Strategy
The training uses assistant-only loss on all assistant turns in multi-turn trajectories.
This enables the model to learn:
- environment observation understanding
- correct action selection
- tool usage
- error recovery
- structured SQL reasoning
Loss is NOT applied to user or system tokens, which improves generalization.
Expected Behavior
This adapter is optimized for:
- multi-step reasoning
- tool-augmented interaction
- SQL query construction
- embodied action planning
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "todalaba/qwen3-4b-agent-trajectory-lora-mixed-alf07-db03"
tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
base,
torch_dtype=torch.float16,
device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)
- Downloads last month
- -
Model tree for hiroshij/test105-todalaba-0222-012
Base model
Qwen/Qwen3-4B-Instruct-2507