qwen2.5-7b-agent-dbv4_alfv5_final_4epoch
A merged agent SFT model based on Qwen/Qwen2.5-7B-Instruct, fine-tuned for AgentBench-style multi-task agent behavior, specifically:
- DBBench (SQL generation with structured reasoning / tool-style outputs)
- ALFWorld (multi-step household task trajectory planning)
This repository contains fully merged weights (LoRA merged into the base model) and the corresponding tokenizer.
Model Summary
- Base model: Qwen/Qwen2.5-7B-Instruct
- Training style: Supervised fine-tuning (SFT), multi-turn trajectories
- Data: Mixed agent trajectories for DBBench + ALFWorld
- Special focus (DBBench): Improved SQL string-literal fidelity via an explicit reminder appended to the first user message
- Epochs: 4
This is a specialized agent model. General chat quality may be worse than the base model.
Expected Prompting Style
This model is tuned for system-tagged multi-task data.
DBBench mode
Include a system message containing:
<TASK: DBBench>
ALFWorld mode
Include a system message containing:
<TASK: ALFWorld>
SQL String-Literal Reminder (DBBench)
During training, DBBench examples were modified to include the following reminder in the first user message:
- Copy string literals inside quotes EXACTLY from database outputs
- Preserve whitespace, punctuation, and capitalization inside quotes
- If the exact string is not given, SELECT it first and then copy it verbatim
How to Use
Example (Transformers):
from transformers import AutoTokenizer, AutoModelForCausalLM import torch
model_id = "HamadaMayu/qwen2.5-7b-agent-dbv4_alfv5_final_4epoch"
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype="auto", device_map="auto", )
messages = [ {"role": "system", "content": "<TASK: DBBench>"}, {"role": "user", "content": "Write a SQL query to count the number of rows in table employees."} ]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt").to(model.device)
out = model.generate( **inputs, max_new_tokens=256, do_sample=False, )
print(tokenizer.decode(out[0], skip_special_tokens=True))
Notes:
- Use do_sample=False for deterministic behavior (often preferred in agent benchmarks).
- If outputs are too long, lower max_new_tokens.
Training Details (High Level)
Datasets:
- ALFWorld trajectories: u-10bei/sft_alfworld_trajectory_dataset_v5 (filtered to success; capped by step length)
- DBBench trajectories: u-10bei/dbbench_sft_dataset_react_v4 (with a DB string-literal reminder appended)
Multi-task control:
- System tags: <TASK: DBBench>, <TASK: ALFWorld>
Loss masking:
- Loss is applied only to assistant turns (assistant-only supervision)
Max sequence length:
- 4096
LoRA:
- LoRA adapter merged into the base model weights before upload
Limitations
- This is a specialized agent SFT model; general chat capability may degrade compared to the base model.
- SQL string-literal fidelity improves mainly when:
- DB outputs provide the target strings, or
- the task allows fetching them via SELECT before using them.
- Multi-step agent behavior can still fail due to:
- long-horizon planning difficulty
- missing recovery patterns in training data
- context truncation if inputs exceed max length
Acknowledgements
Base model:
- Qwen2.5 by the Qwen team
Datasets / training resources:
- ALFWorld & DBBench trajectories via AgentBench training resources
Training stack:
- Transformers
- Datasets
- PEFT
- Unsloth
- Downloads last month
- 4