KOUJI039's picture
Update README.md
d00b5b2 verified
metadata
base_model: Qwen/Qwen3-4B-Instruct-2507
datasets:
  - u-10bei/sft_alfworld_trajectory_dataset_v5
language:
  - en
license: mit
library_name: peft
pipeline_tag: text-generation
tags:
  - lora
  - agent
  - tool-use
  - alfworld

LLM Lecture 2025 Advanced Competition (AgentBench: DBBench + ALFWorld)

This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using LoRA + Unsloth.

This repository contains LoRA adapter weights only. The base model must be loaded separately.

Training Objective

This adapter is trained to improve multi-turn agent task performance on ALFWorld (household tasks).

Loss is applied to all assistant turns in the multi-turn trajectory, enabling the model to learn observation grounding, action selection, tool use, and recovery from errors.

Training data used in this run is ALFWorld only (see datasets in YAML). Evaluation in the competition includes AgentBench tasks (DBBench + ALFWorld) by the organizers.

Training Configuration

  • Base model: Qwen/Qwen3-4B-Instruct-2507
  • Method: LoRA (PEFT)
  • Max sequence length: 2048
  • Epochs: 2
  • Learning rate: 1.5e-6
  • LoRA: r=64, alpha=128

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "KOUJI039/structeval-qwen3-4b-sft-try48"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)

Sources & Terms (IMPORTANT)

Training data:

  • u-10bei/sft_alfworld_trajectory_dataset_v5

This repository does NOT redistribute the dataset. Users must comply with the dataset license and base model terms.