Qwen3-4B Agent Trajectory (v14)

This repository provides a fully merged model fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using Unsloth.

Unlike standard adapter repositories, this repository contains the merged weights, meaning you do not need to load the base model separately.

Training Objective

This model is trained to improve multi-turn agent task performance on ALFWorld (household tasks).

Loss is applied to all assistant turns in the multi-turn trajectory, enabling the model to learn environment observation, action selection, tool use, and recovery from errors.

Data Processing

  • Train/Validation Split: 95% / 5%
  • Random Seed: 3407 (used for shuffling and initialization)
  • Loss Masking: Loss was computed only on the assistant's responses. User prompts and observations were masked during training (train_on_responses_only was applied to <|im_start|>assistant\n).

Training Configuration

  • Base model: Qwen/Qwen3-4B-Instruct-2507
  • Method: LoRA + Unsloth (Merged in 16-bit)
  • Max sequence length: 8192
  • Epochs: 1
  • Learning rate: 2e-06
  • LoRA: r=16, alpha=32
  • PER_DEVICE_TRAIN_BATCH_SIZE = 4
  • GRAD_ACCUM = 4
  • WARMUP_RATIO = 0.1
  • WEIGHT_DECAY = 0.05
  • NEFTUNE_NOISE_ALPHA = 5.0
  • VAL_RATIO = 0.05

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "choco800/qwen3-4b-agent-v14"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

Sources & Terms (IMPORTANT)

Training data:

  • u-10bei/dbbench_sft_dataset_react (available on Hugging Face Hub)
  • u-10bei/dbbench_sft_dataset_react_v2 (available on Hugging Face Hub)
  • u-10bei/dbbench_sft_dataset_react_v3 (available on Hugging Face Hub)
  • u-10bei/dbbench_sft_dataset_react_v4

Dataset License: MIT License. These datasets are used and distributed under the terms of the MIT License. Compliance: Users must comply with the dataset licenses and the base model's original terms of use (Apache 2.0).

Downloads last month
-
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for choco800/qwen3-4b-agent-v14

Finetuned
(1347)
this model

Datasets used to train choco800/qwen3-4b-agent-v14