Qwen-4B-DB-AlfWorld-v8

This repository provides a merged model fine-tuned from Qwen/Qwen3-4B-Instruct-2507 on datasets u-10bei/sft_alfworld_trajectory_dataset_v5, u-10bei/sft_alfworld_trajectory_dataset_v4,dbbench_sft_dataset_react_v2 and dbbench_sft_dataset_react_v3.

All LoRA adapter weights have been merged into the base model, and the resulting merged model is saved here as a standalone model.
No external adapter loading is required.

Dataset Notes (IMPORTANT)

For u-10bei/sft_alfworld_trajectory_dataset_v5,
only samples with:

  • input length ≤ 2048 tokens,
  • trajectory_outcome == "success",
  • num_steps ≤ 35

were included in the training set.

These filters were applied to ensure training stability, reduce noisy or failed trajectories, and maintain consistency with the maximum sequence length used during training.

Training Objective

This model is trained to improve multi-turn agent task performance on ALFWorld (household tasks) and DBBench (database operations).

Loss is applied to all assistant turns in multi-turn trajectories, enabling the model to learn environment observation, step-by-step reasoning, action execution, tool use, and recovery from errors.

Training Configuration

  • Base model: Qwen/Qwen3-4B-Instruct-2507
  • Method: LoRA (merged into final weights)
  • Max sequence length: 2048
  • Learning rate: 1e-06
  • LoRA parameters used during training: r=8, alpha=16

Usage (Agent-style Inference Example)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Umiharu/Qwen-4B-DB-AlfWorld-v8"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

prompt = "You are a household task-solving agent. Respond 'OK' if you are ready."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=64,
    temperature=0.2,
    do_sample=False,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Sources & Terms (IMPORTANT)

Training data: u-10bei/sft_alfworld_trajectory_dataset_v5, u-10bei/sft_alfworld_trajectory_dataset_v4,dbbench_sft_dataset_react_v2 and dbbench_sft_dataset_react_v3

Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License. Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.

Downloads last month
3
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Umiharu/Qwen-4B-DB-AlfWorld-v8

Finetuned
(1654)
this model

Datasets used to train Umiharu/Qwen-4B-DB-AlfWorld-v8