Qwen-4B-DB-AlfWorld-v8

This repository provides a merged model fine-tuned from Qwen/Qwen3-4B-Instruct-2507 on datasets u-10bei/sft_alfworld_trajectory_dataset_v5, u-10bei/sft_alfworld_trajectory_dataset_v4,dbbench_sft_dataset_react_v2 and dbbench_sft_dataset_react_v3.

All LoRA adapter weights have been merged into the base model, and the resulting merged model is saved here as a standalone model.
No external adapter loading is required.

Dataset Notes (IMPORTANT)

For u-10bei/sft_alfworld_trajectory_dataset_v5,
only samples with:

input length ≤ 2048 tokens,
trajectory_outcome == "success",
num_steps ≤ 35

were included in the training set.

These filters were applied to ensure training stability, reduce noisy or failed trajectories, and maintain consistency with the maximum sequence length used during training.

Training Objective

This model is trained to improve multi-turn agent task performance on ALFWorld (household tasks) and DBBench (database operations).

Loss is applied to all assistant turns in multi-turn trajectories, enabling the model to learn environment observation, step-by-step reasoning, action execution, tool use, and recovery from errors.

Training Configuration

Base model: Qwen/Qwen3-4B-Instruct-2507
Method: LoRA (merged into final weights)
Max sequence length: 2048
Learning rate: 1e-06
LoRA parameters used during training: r=8, alpha=16

Usage (Agent-style Inference Example)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Umiharu/Qwen-4B-DB-AlfWorld-v8"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

prompt = "You are a household task-solving agent. Respond 'OK' if you are ready."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=64,
    temperature=0.2,
    do_sample=False,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Sources & Terms (IMPORTANT)

Training data: u-10bei/sft_alfworld_trajectory_dataset_v5, u-10bei/sft_alfworld_trajectory_dataset_v4,dbbench_sft_dataset_react_v2 and dbbench_sft_dataset_react_v3

Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License. Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.

Downloads last month: 3

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for Umiharu/Qwen-4B-DB-AlfWorld-v8

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

(1654)

this model

Umiharu
/

Qwen-4B-DB-AlfWorld-v8