adv-sft-v9 / README.md
Chattso-GPT's picture
Upload Qwen3-4B SFT v9 (all datasets + r=64)
d555ec2 verified
metadata
base_model: Qwen/Qwen3-4B-Instruct-2507
datasets:
  - u-10bei/sft_alfworld_trajectory_dataset_v2
  - u-10bei/sft_alfworld_trajectory_dataset_v3
  - u-10bei/sft_alfworld_trajectory_dataset_v9
  - u-10bei/sft_alfworld_trajectory_dataset_v5
  - u-10bei/dbbench_sft_dataset_react
  - u-10bei/dbbench_sft_dataset_react_v2
  - u-10bei/dbbench_sft_dataset_react_v3
  - u-10bei/dbbench_sft_dataset_react_v9
language:
  - en
license: apache-2.0
library_name: peft
pipeline_tag: text-generation
tags:
  - lora
  - agent
  - alfworld
  - dbbench
  - agentbench

Qwen3-4B Agent SFT v9 (All Datasets + Optimized)

LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using LoRA + Unsloth. This repository contains LoRA adapter weights only.

Training Configuration

  • Base model: Qwen/Qwen3-4B-Instruct-2507
  • Method: LoRA (full precision) + Unsloth
  • Datasets: ALFWorld v2-v5 (deduplicated, EEF) + DBBench v1-v9 (deduplicated, 2x upsampled)
  • Max sequence length: 4096
  • Epochs: 1
  • Learning rate: 2e-6
  • LoRA: r=64, alpha=128
  • Scheduler: cosine with warmup 10%

Sources & Terms

Dataset License: MIT License.