Qwen3-4B Agent Trajectory (v12)

This repository provides a fully merged model fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using Unsloth.

Unlike standard adapter repositories, this repository contains the merged weights, meaning you do not need to load the base model separately.

Training Objective

This model is trained to improve multi-turn agent task performance on ALFWorld (household tasks) and DBBench (database operations).

Loss is applied to all assistant turns in the multi-turn trajectory, enabling the model to learn environment observation, action selection, tool use, and recovery from errors.

Data Processing

Train/Validation Split: 95% / 5%
Random Seed: 3407 (used for shuffling and initialization)
Loss Masking: Loss was computed only on the assistant's responses. User prompts and observations were masked during training (train_on_responses_only was applied to <|im_start|>assistant\n).

Training Configuration

Base model: Qwen/Qwen3-4B-Instruct-2507
Method: LoRA + Unsloth (Merged in 16-bit)
Max sequence length: 8192
Epochs: 1
Learning rate: 1e-05
LoRA: r=16, alpha=32
PER_DEVICE_TRAIN_BATCH_SIZE = 4
GRAD_ACCUM = 2
WARMUP_RATIO = 0.1
WEIGHT_DECAY = 0.05
NEFTUNE_NOISE_ALPHA = 5.0
VAL_RATIO = 0.05

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "choco800/qwen3-4b-agent-v12"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

Sources & Terms (IMPORTANT)

Training data:

u-10bei/dbbench_sft_dataset_react

Dataset License: MIT License. These datasets are used and distributed under the terms of the MIT License. Compliance: Users must comply with the dataset licenses and the base model's original terms of use (Apache 2.0).

Downloads last month: 4

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for choco800/qwen3-4b-agent-v12

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

(1795)

this model

choco800
/

qwen3-4b-agent-v12