qwen2.5-7b-agent-dbv4_alfv5_final_4epoch

A merged agent SFT model based on Qwen/Qwen2.5-7B-Instruct, fine-tuned for AgentBench-style multi-task agent behavior, specifically:

  • DBBench (SQL generation with structured reasoning / tool-style outputs)
  • ALFWorld (multi-step household task trajectory planning)

This repository contains fully merged weights (LoRA merged into the base model) and the corresponding tokenizer.


Model Summary

  • Base model: Qwen/Qwen2.5-7B-Instruct
  • Training style: Supervised fine-tuning (SFT), multi-turn trajectories
  • Data: Mixed agent trajectories for DBBench + ALFWorld
  • Special focus (DBBench): Improved SQL string-literal fidelity via an explicit reminder appended to the first user message
  • Epochs: 4

This is a specialized agent model. General chat quality may be worse than the base model.


Expected Prompting Style

This model is tuned for system-tagged multi-task data.

DBBench mode

Include a system message containing:

<TASK: DBBench>

ALFWorld mode

Include a system message containing:

<TASK: ALFWorld>


SQL String-Literal Reminder (DBBench)

During training, DBBench examples were modified to include the following reminder in the first user message:

  • Copy string literals inside quotes EXACTLY from database outputs
  • Preserve whitespace, punctuation, and capitalization inside quotes
  • If the exact string is not given, SELECT it first and then copy it verbatim

How to Use

Example (Transformers):

from transformers import AutoTokenizer, AutoModelForCausalLM import torch

model_id = "HamadaMayu/qwen2.5-7b-agent-dbv4_alfv5_final_4epoch"

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype="auto", device_map="auto", )

messages = [ {"role": "system", "content": "<TASK: DBBench>"}, {"role": "user", "content": "Write a SQL query to count the number of rows in table employees."} ]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device)

out = model.generate( **inputs, max_new_tokens=256, do_sample=False, )

print(tokenizer.decode(out[0], skip_special_tokens=True))

Notes:

  • Use do_sample=False for deterministic behavior (often preferred in agent benchmarks).
  • If outputs are too long, lower max_new_tokens.

Training Details (High Level)

Datasets:

  • ALFWorld trajectories: u-10bei/sft_alfworld_trajectory_dataset_v5 (filtered to success; capped by step length)
  • DBBench trajectories: u-10bei/dbbench_sft_dataset_react_v4 (with a DB string-literal reminder appended)

Multi-task control:

  • System tags: <TASK: DBBench>, <TASK: ALFWorld>

Loss masking:

  • Loss is applied only to assistant turns (assistant-only supervision)

Max sequence length:

  • 4096

LoRA:

  • LoRA adapter merged into the base model weights before upload

Limitations

  • This is a specialized agent SFT model; general chat capability may degrade compared to the base model.
  • SQL string-literal fidelity improves mainly when:
    • DB outputs provide the target strings, or
    • the task allows fetching them via SELECT before using them.
  • Multi-step agent behavior can still fail due to:
    • long-horizon planning difficulty
    • missing recovery patterns in training data
    • context truncation if inputs exceed max length

Acknowledgements

Base model:

  • Qwen2.5 by the Qwen team

Datasets / training resources:

  • ALFWorld & DBBench trajectories via AgentBench training resources

Training stack:

  • Transformers
  • Datasets
  • PEFT
  • Unsloth
Downloads last month
4
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for HamadaMayu/qwen2.5-7b-agent-dbv4_alfv5_final_4epoch

Base model

Qwen/Qwen2.5-7B
Finetuned
(3270)
this model