qwen3-4b-agent-trajectory-lora-mixed-alf07-db03

This repository provides a LoRA adapter fine-tuned from
Qwen/Qwen3-4B-Instruct-2507 using LoRA + Unsloth.

This repository contains LoRA adapter weights only.
The base model must be loaded separately.


Training Objective

This adapter is trained to improve multi-turn agent task performance
on two complementary task families:

1. ALFWorld (Embodied Household Tasks)

  • Object manipulation
  • Sequential action planning
  • Observation → Action → Feedback loop

2. DBBench (Database Reasoning Tasks)

  • SQL generation
  • Schema exploration
  • Tool use & error recovery

Dataset Composition

The training dataset is a mixed trajectory dataset constructed by sampling:

  • ALFWorld : DBBench = 0.7 : 0.3

This mixture ratio is chosen to balance:

  • high success rate in ALFWorld (action planning ability)
  • high SQL accuracy in DBBench (symbolic reasoning ability)

This design empirically improves overall AgentBench performance by avoiding
over-specialization to either domain.


Training Configuration

  • Base model: Qwen/Qwen3-4B-Instruct-2507
  • Method: LoRA (full precision base model)
  • Max sequence length: 2048
  • Epochs: 2
  • Learning rate: 2e-6
  • Warmup ratio: 0.05
  • Weight decay: 0.05

LoRA configuration

  • Rank (r): 96
  • Alpha: 128
  • Dropout: 0.06
  • Target modules:
    • q_proj
    • k_proj
    • v_proj
    • o_proj
    • gate_proj
    • up_proj
    • down_proj

Dataset mixture

  • ALFWorld 70%
  • DBBench 30%

Training Strategy

The training uses assistant-only loss on all assistant turns in multi-turn trajectories.

This enables the model to learn:

  • environment observation understanding
  • correct action selection
  • tool usage
  • error recovery
  • structured SQL reasoning

Loss is NOT applied to user or system tokens, which improves generalization.


Expected Behavior

This adapter is optimized for:

  • multi-step reasoning
  • tool-augmented interaction
  • SQL query construction
  • embodied action planning

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "todalaba/qwen3-4b-agent-trajectory-lora-mixed-alf07-db03"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
)

model = PeftModel.from_pretrained(model, adapter)
Downloads last month
-
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for hiroshij/test105-todalaba-0222-012

Adapter
(5268)
this model

Datasets used to train hiroshij/test105-todalaba-0222-012