Instructions to use choco800/qwen3-4b-agent-v12 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Local Apps
- Unsloth Studio new
How to use choco800/qwen3-4b-agent-v12 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for choco800/qwen3-4b-agent-v12 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for choco800/qwen3-4b-agent-v12 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for choco800/qwen3-4b-agent-v12 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="choco800/qwen3-4b-agent-v12", max_seq_length=2048, )
Qwen3-4B Agent Trajectory (v12)
This repository provides a fully merged model fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using Unsloth.
Unlike standard adapter repositories, this repository contains the merged weights, meaning you do not need to load the base model separately.
Training Objective
This model is trained to improve multi-turn agent task performance on ALFWorld (household tasks) and DBBench (database operations).
Loss is applied to all assistant turns in the multi-turn trajectory, enabling the model to learn environment observation, action selection, tool use, and recovery from errors.
Data Processing
- Train/Validation Split: 95% / 5%
- Random Seed: 3407 (used for shuffling and initialization)
- Loss Masking: Loss was computed only on the assistant's responses. User prompts and observations were masked during training (
train_on_responses_onlywas applied to<|im_start|>assistant\n).
Training Configuration
- Base model: Qwen/Qwen3-4B-Instruct-2507
- Method: LoRA + Unsloth (Merged in 16-bit)
- Max sequence length: 8192
- Epochs: 1
- Learning rate: 1e-05
- LoRA: r=16, alpha=32
- PER_DEVICE_TRAIN_BATCH_SIZE = 4
- GRAD_ACCUM = 2
- WARMUP_RATIO = 0.1
- WEIGHT_DECAY = 0.05
- NEFTUNE_NOISE_ALPHA = 5.0
- VAL_RATIO = 0.05
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "choco800/qwen3-4b-agent-v12"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
Sources & Terms (IMPORTANT)
Training data:
- u-10bei/dbbench_sft_dataset_react
Dataset License: MIT License. These datasets are used and distributed under the terms of the MIT License. Compliance: Users must comply with the dataset licenses and the base model's original terms of use (Apache 2.0).
- Downloads last month
- -
Model tree for choco800/qwen3-4b-agent-v12
Base model
Qwen/Qwen3-4B-Instruct-2507