Qwen3-4B Agent Trajectory (v14)
This repository provides a fully merged model fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using Unsloth.
Unlike standard adapter repositories, this repository contains the merged weights, meaning you do not need to load the base model separately.
Training Objective
This model is trained to improve multi-turn agent task performance on ALFWorld (household tasks).
Loss is applied to all assistant turns in the multi-turn trajectory, enabling the model to learn environment observation, action selection, tool use, and recovery from errors.
Data Processing
- Train/Validation Split: 95% / 5%
- Random Seed: 3407 (used for shuffling and initialization)
- Loss Masking: Loss was computed only on the assistant's responses. User prompts and observations were masked during training (
train_on_responses_onlywas applied to<|im_start|>assistant\n).
Training Configuration
- Base model: Qwen/Qwen3-4B-Instruct-2507
- Method: LoRA + Unsloth (Merged in 16-bit)
- Max sequence length: 8192
- Epochs: 1
- Learning rate: 2e-06
- LoRA: r=16, alpha=32
- PER_DEVICE_TRAIN_BATCH_SIZE = 4
- GRAD_ACCUM = 4
- WARMUP_RATIO = 0.1
- WEIGHT_DECAY = 0.05
- NEFTUNE_NOISE_ALPHA = 5.0
- VAL_RATIO = 0.05
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "choco800/qwen3-4b-agent-v14"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
Sources & Terms (IMPORTANT)
Training data:
- u-10bei/dbbench_sft_dataset_react (available on Hugging Face Hub)
- u-10bei/dbbench_sft_dataset_react_v2 (available on Hugging Face Hub)
- u-10bei/dbbench_sft_dataset_react_v3 (available on Hugging Face Hub)
- u-10bei/dbbench_sft_dataset_react_v4
Dataset License: MIT License. These datasets are used and distributed under the terms of the MIT License. Compliance: Users must comply with the dataset licenses and the base model's original terms of use (Apache 2.0).
- Downloads last month
- -
Model tree for choco800/qwen3-4b-agent-v14
Base model
Qwen/Qwen3-4B-Instruct-2507