qwen2.5-7b-agent-dbv4_alfv5_final_4epoch

A merged agent SFT model based on Qwen/Qwen2.5-7B-Instruct, fine-tuned for AgentBench-style multi-task agent behavior, specifically:

DBBench (SQL generation with structured reasoning / tool-style outputs)
ALFWorld (multi-step household task trajectory planning)

This repository contains fully merged weights (LoRA merged into the base model) and the corresponding tokenizer.

Model Summary

Base model: Qwen/Qwen2.5-7B-Instruct
Training style: Supervised fine-tuning (SFT), multi-turn trajectories
Data: Mixed agent trajectories for DBBench + ALFWorld
Special focus (DBBench): Improved SQL string-literal fidelity via an explicit reminder appended to the first user message
Epochs: 4

This is a specialized agent model. General chat quality may be worse than the base model.

Expected Prompting Style

This model is tuned for system-tagged multi-task data.

DBBench mode

Include a system message containing:

<TASK: DBBench>

ALFWorld mode

Include a system message containing:

<TASK: ALFWorld>

SQL String-Literal Reminder (DBBench)

During training, DBBench examples were modified to include the following reminder in the first user message:

Copy string literals inside quotes EXACTLY from database outputs
Preserve whitespace, punctuation, and capitalization inside quotes
If the exact string is not given, SELECT it first and then copy it verbatim

How to Use

Example (Transformers):

from transformers import AutoTokenizer, AutoModelForCausalLM import torch

model_id = "HamadaMayu/qwen2.5-7b-agent-dbv4_alfv5_final_4epoch"

tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype="auto", device_map="auto", )

messages = [ {"role": "system", "content": "<TASK: DBBench>"}, {"role": "user", "content": "Write a SQL query to count the number of rows in table employees."} ]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device)

out = model.generate( **inputs, max_new_tokens=256, do_sample=False, )

print(tokenizer.decode(out[0], skip_special_tokens=True))

Notes:

Use do_sample=False for deterministic behavior (often preferred in agent benchmarks).
If outputs are too long, lower max_new_tokens.

Training Details (High Level)

Datasets:

ALFWorld trajectories: u-10bei/sft_alfworld_trajectory_dataset_v5 (filtered to success; capped by step length)
DBBench trajectories: u-10bei/dbbench_sft_dataset_react_v4 (with a DB string-literal reminder appended)

Multi-task control:

System tags: <TASK: DBBench>, <TASK: ALFWorld>

Loss masking:

Loss is applied only to assistant turns (assistant-only supervision)

Max sequence length:

4096

LoRA:

LoRA adapter merged into the base model weights before upload

Limitations

This is a specialized agent SFT model; general chat capability may degrade compared to the base model.
SQL string-literal fidelity improves mainly when:
- DB outputs provide the target strings, or
- the task allows fetching them via SELECT before using them.
Multi-step agent behavior can still fail due to:
- long-horizon planning difficulty
- missing recovery patterns in training data
- context truncation if inputs exceed max length

Acknowledgements

Base model:

Qwen2.5 by the Qwen team

Datasets / training resources:

ALFWorld & DBBench trajectories via AgentBench training resources

Training stack:

Transformers
Datasets
PEFT
Unsloth

Downloads last month: 4

Safetensors

Model size

8B params

Tensor type

BF16

Model tree for HamadaMayu/qwen2.5-7b-agent-dbv4_alfv5_final_4epoch

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Finetuned

(3270)

this model