qwen2.5-7b-instruct-sft-v2
This repository provides a merged full model produced by supervised fine-tuning for ALFWorld-oriented action selection.
Training Objective
Improve strict action selection reliability for ALFWorld-style prompts where the model must output one exact action line from AVAILABLE ACTIONS.
Training Configuration
- Method: SFT (Unsloth LoRA) + merge to full model
- Base model ID:
Qwen/Qwen2.5-7B-Instruct - LoRA:
r=16,alpha=32,dropout=0.05 - LoRA target modules:
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj - Max sequence length:
4096 - Max steps:
350 - Epochs:
1 - Learning rate:
1.0e-6 - Per-device train batch size:
1 - Per-device eval batch size:
2 - Gradient accumulation steps:
32 - Effective global batch size:
32 - Warmup ratio:
0.03 - Weight decay:
0.01 - Eval/Save steps:
50 / 50
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "uchkw/qwen2.5-7b-instruct-sft-v2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")
Training Data / Sources & License (IMPORTANT)
- Primary source dataset:
u-10bei/sft_alfworld_trajectory_dataset_v5
- Data construction policy:
- Converted trajectories into strict chat-style supervision with one-line action targets (
ACTION: ...). - Trained with exact action matching against candidate action lists to improve action validity.
- Added synthetic correction-style augmentation for common formatting and target-selection errors.
- Converted trajectories into strict chat-style supervision with one-line action targets (
- Dataset scale (used for this merged model):
- Train samples:
35950 - Validation samples:
1892 - Action coverage was balanced to avoid over-concentration on a few verbs.
- Train samples:
- Sequence-length check:
- No truncation observed at
max_seq_length=4096.
- No truncation observed at
- Compliance:
- Follow the source dataset license on each dataset card.
- Follow base model terms of use.
- Downloads last month
- 15