Qwen-4B-DB-AlfWorld-v8
This repository provides a merged model fine-tuned from Qwen/Qwen3-4B-Instruct-2507 on datasets u-10bei/sft_alfworld_trajectory_dataset_v5, u-10bei/sft_alfworld_trajectory_dataset_v4,dbbench_sft_dataset_react_v2 and dbbench_sft_dataset_react_v3.
All LoRA adapter weights have been merged into the base model, and the
resulting merged model is saved here as a standalone model.
No external adapter loading is required.
Dataset Notes (IMPORTANT)
For u-10bei/sft_alfworld_trajectory_dataset_v5,
only samples with:
- input length ≤ 2048 tokens,
- trajectory_outcome == "success",
- num_steps ≤ 35
were included in the training set.
These filters were applied to ensure training stability, reduce noisy or failed trajectories, and maintain consistency with the maximum sequence length used during training.
Training Objective
This model is trained to improve multi-turn agent task performance on ALFWorld (household tasks) and DBBench (database operations).
Loss is applied to all assistant turns in multi-turn trajectories, enabling the model to learn environment observation, step-by-step reasoning, action execution, tool use, and recovery from errors.
Training Configuration
- Base model: Qwen/Qwen3-4B-Instruct-2507
- Method: LoRA (merged into final weights)
- Max sequence length: 2048
- Learning rate: 1e-06
- LoRA parameters used during training: r=8, alpha=16
Usage (Agent-style Inference Example)
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "Umiharu/Qwen-4B-DB-AlfWorld-v8"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto",
)
prompt = "You are a household task-solving agent. Respond 'OK' if you are ready."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=64,
temperature=0.2,
do_sample=False,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Sources & Terms (IMPORTANT)
Training data: u-10bei/sft_alfworld_trajectory_dataset_v5, u-10bei/sft_alfworld_trajectory_dataset_v4,dbbench_sft_dataset_react_v2 and dbbench_sft_dataset_react_v3
Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License. Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.
- Downloads last month
- 3
Model tree for Umiharu/Qwen-4B-DB-AlfWorld-v8
Base model
Qwen/Qwen3-4B-Instruct-2507