choco800
/

qwen3-4b-agent-v21

Text Generation

Model card Files Files and versions

qwen3-4b-agent-v21 / README.md

choco800's picture

Upload README.md with huggingface_hub

30fda0f verified 3 months ago

|

history blame contribute delete

2.42 kB

	---
	base_model: Qwen/Qwen3-4B-Instruct-2507
	datasets:
	- u-10bei/sft_alfworld_trajectory_dataset_v5
	- u-10bei/sft_alfworld_trajectory_dataset_v4
	- u-10bei/sft_alfworld_trajectory_dataset_v3
	language:
	- en
	license: apache-2.0
	pipeline_tag: text-generation
	tags:
	- unsloth
	- agent
	- tool-use
	- alfworld
	---

	# Qwen3-4B Agent Trajectory (v21)

	This repository provides a fully merged model fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using Unsloth.

	Unlike standard adapter repositories, this repository contains the merged weights, meaning you do not need to load the base model separately.

	## Training Objective

	This model is trained to improve multi-turn agent task performance
	on ALFWorld (household tasks).

	Loss is applied to all assistant turns in the multi-turn trajectory,
	enabling the model to learn environment observation, action selection,
	tool use, and recovery from errors.

	## Data Processing

	- Train/Validation Split: 95% / 5%
	- Random Seed: 3407 (used for shuffling and initialization)
	- Loss Masking: Loss was computed only on the assistant's responses. User prompts and observations were masked during training (`train_on_responses_only` was applied to `<\|im_start\|>assistant\n`).

	## Training Configuration
	- Base model: Qwen/Qwen3-4B-Instruct-2507
	- Method: LoRA + Unsloth (Merged in 16-bit)
	- Max sequence length: 8192
	- Epochs: 1
	- Learning rate: 3e-06
	- LoRA: r=16, alpha=32
	- PER_DEVICE_TRAIN_BATCH_SIZE = 4
	- GRAD_ACCUM = 2
	- WARMUP_RATIO = 0.1
	- WEIGHT_DECAY = 0.05
	- NEFTUNE_NOISE_ALPHA = 5.0
	- VAL_RATIO = 0.05

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_id = "choco800/qwen3-4b-agent-v21"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.bfloat16,
	device_map="auto",
	)
	```
	## Sources & Terms (IMPORTANT)

	Training data:
	- u-10bei/sft_alfworld_trajectory_dataset_v5 (available on Hugging Face Hub)
	- u-10bei/sft_alfworld_trajectory_dataset_v4 (available on Hugging Face Hub)
	- u-10bei/sft_alfworld_trajectory_dataset_v3 (available on Hugging Face Hub)

	Dataset License: MIT License. These datasets are used and distributed under the terms of the MIT License.
	Compliance: Users must comply with the dataset licenses and the base model's original terms of use (Apache 2.0).