metadata
base_model: Qwen/Qwen3-4B-Instruct-2507
datasets:
- u-10bei/sft_alfworld_trajectory_dataset_v2
- u-10bei/sft_alfworld_trajectory_dataset_v3
- u-10bei/sft_alfworld_trajectory_dataset_v9
- u-10bei/sft_alfworld_trajectory_dataset_v5
- u-10bei/dbbench_sft_dataset_react
- u-10bei/dbbench_sft_dataset_react_v2
- u-10bei/dbbench_sft_dataset_react_v3
- u-10bei/dbbench_sft_dataset_react_v9
language:
- en
license: apache-2.0
library_name: peft
pipeline_tag: text-generation
tags:
- lora
- agent
- alfworld
- dbbench
- agentbench
Qwen3-4B Agent SFT v9 (All Datasets + Optimized)
LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using LoRA + Unsloth. This repository contains LoRA adapter weights only.
Training Configuration
- Base model: Qwen/Qwen3-4B-Instruct-2507
- Method: LoRA (full precision) + Unsloth
- Datasets: ALFWorld v2-v5 (deduplicated, EEF) + DBBench v1-v9 (deduplicated, 2x upsampled)
- Max sequence length: 4096
- Epochs: 1
- Learning rate: 2e-6
- LoRA: r=64, alpha=128
- Scheduler: cosine with warmup 10%
Sources & Terms
Dataset License: MIT License.