qwen2.5-7b-instruct-sft-v4
This repository provides a merged full model produced by supervised fine-tuning for AgentBench-oriented ALFWorld/DBBench robustness.
Training Objective
Improve strict action selection reliability for ALFWorld prompts and strengthen SQL error-recovery robustness for DBBench prompts.
Training Configuration
- Method: SFT (Unsloth LoRA) + merge to full model
- Base model ID:
Qwen/Qwen2.5-7B-Instruct - LoRA:
r=16,alpha=32,dropout=0.0 - LoRA target modules:
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj - Output learn mode:
from_marker(ACTION:) - Max sequence length:
4096 - Max steps:
350 - Epochs:
1 - Learning rate:
1.0e-6 - Per-device train batch size:
1 - Per-device eval batch size:
2 - Gradient accumulation steps:
32 - Effective global batch size:
32 - Warmup ratio:
0.03 - Weight decay:
0.01 - Eval/Save steps:
50 / 50
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "uchkw/qwen2.5-7b-instruct-sft-v4"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
Training Data / Sources & License (IMPORTANT)
- Primary source datasets:
u-10bei/sft_alfworld_trajectory_dataset_v5u-10bei/dbbench_sft_dataset_react_v4u-10bei/dbbench_sft_dataset_react_v3
- Data construction policy (concise):
- ALFWorld data was converted to strict one-line action supervision (
ACTION: ...) with exact matching againstAVAILABLE ACTIONS. - Added ALF trap-guard augmentation (e.g.,
Task succeeded.in observation context) while preserving strict action targets. - Added DBBench unknown-column recovery-oriented synthetic rows and duplicated unknown-column cases to stabilize recovery behavior.
- Mixed training ratio was controlled at approximately
ALF:DB = 55:45.
- ALFWorld data was converted to strict one-line action supervision (
- Dataset scale (this model):
- Train samples:
36455 - Validation samples:
1918 - ALF strict-match in training set:
1.0 - ALF completion-verb ratio:
0.4711 - ALF trap ratio:
0.1500 - ALF toggle rows:
467 - DB unknown-column rows:
64 -> 128(before/after duplication)
- Train samples:
- Compliance:
- Follow each source dataset card and license terms.
- Follow base model terms of use.
- Downloads last month
- 2