qwen25-7b-sft-merged-v5v6-a50

This repository provides a fully merged model fine-tuned from Qwen2.5-7B-Instruct using QLoRA + Unsloth.

Two SFT models (v5 and v6) were trained independently, then combined via weight interpolation (alpha=0.5). This is a complete model — no adapters or additional weights are needed.

Training Objective

This model is trained to improve multi-turn agent task performance on ALFWorld (household tasks) and DBBench (database operations).

Loss is applied to all assistant turns in the multi-turn trajectory, enabling the model to learn environment observation, action selection, tool use, and recovery from errors.

Training Configuration

  • Base model: Qwen/Qwen2.5-7B-Instruct
  • Method: QLoRA (4-bit) + Unsloth, merged into base model
  • Max sequence length: 2048
  • Epochs: 2
  • Learning rate: 5e-5
  • LoRA: r=32, alpha=64
  • Post-training: weight interpolation of v5 and v6 (alpha=0.5)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "plotMaker/qwen25-7b-sft-merged-v5v6-a50"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

References

Sources & Terms (IMPORTANT)

Training data:

  • u-10bei/sft_alfworld_trajectory_dataset_v2 ~ v5
  • u-10bei/dbbench_sft_dataset_react ~ v4

Base model: Qwen/Qwen2.5-7B-Instruct

This repository does NOT redistribute the dataset. Users must comply with the dataset license and base model terms.

Downloads last month
73
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for plotMaker/qwen25-7b-sft-merged-v5v6-a50

Base model

Qwen/Qwen2.5-7B
Finetuned
(2675)
this model

Datasets used to train plotMaker/qwen25-7b-sft-merged-v5v6-a50

Papers for plotMaker/qwen25-7b-sft-merged-v5v6-a50