Qwen3-4B AgentBench LoRA (SFT + OPD)
This LoRA adapter was trained in two stages:
- SFT (Supervised Fine-Tuning) on agent trajectory data
- OPD (On-Policy Distillation) from Qwen/Qwen3-30B-A3B-Instruct-2507
Training Configuration
Stage 1: SFT
- Base model: Qwen/Qwen3-4B-Instruct-2507
- Dataset: ALFWorld trajectories
Stage 2: OPD
- Teacher model: Qwen/Qwen3-30B-A3B-Instruct-2507
- Dataset: ALFWorld + DBBench (combined)
- Steps: 100
- Learning rate: 1e-05
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "Kaito-F/qwen3-4b-agentbench-opd-adapter-v2-sample"
tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, device_map="auto")
model = PeftModel.from_pretrained(model, adapter)
- Downloads last month
- 6
Model tree for Kaito-F/qwen3-4b-agentbench-opd-adapter-v2-sample
Base model
Qwen/Qwen3-4B-Instruct-2507