Agent_try09 / README.md
Shin-YAM's picture
Upload merged Qwen3-4B-Instruct-2507 model (auto-generated README)
9055709 verified
metadata
base_model: Qwen/Qwen3-4B-Instruct-2507
datasets:
  - u-10bei/sft_alfworld_trajectory_dataset_v5
  - u-10bei/dbbench_sft_dataset_react_v4
language:
  - en
license: apache-2.0
library_name: peft
pipeline_tag: text-generation
tags:
  - lora
  - agent
  - tool-use
  - alfworld
  - dbbench

<Qwen/Qwen3-4B-Instruct-2507/LoRA-combined_datasets--highLR--CleansedSQLdatasets-lessLORA>

This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using LoRA + Unsloth.

This repository contains LoRA adapter weights only. The base model must be loaded separately.

Training Objective

This adapter is trained to improve multi-turn agent task performance on ALFWorld (household tasks) and DBBench (database operations).

Loss is applied to all assistant turns in the multi-turn trajectory, enabling the model to learn environment observation, action selection, tool use, and recovery from errors.

The training process ic consist of two steps. First, training for LoRA in order to be adapted to Database SQL, Then Secondary, that for ALF is performed separately. Finally, each LoRA adapter is merged into base model sequentially, LoRA for DB and then that for ALF.

Training Configuration

  • Base model: Qwen/Qwen3-4B-Instruct-2507
  • Method: LoRA (full precision base)
  • Max sequence length: 2048
  • Epochs: 2
  • Learning rate: 2e-06
  • LoRA: r=64, alpha=128

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "your_id/your-repo"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)

Sources & Terms (IMPORTANT)

Training data: u-10bei/dbbench_sft_dataset_react_v4

Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License. Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.