Sakai0920's picture
Update README.md
3c38ad8 verified
metadata
license: apache-2.0
tags:
  - qwen2
  - llm-advanced-competition-2025
  - react-agent
  - alfworld
  - dbbench

LLM-Advanced-Competition-2025

This repository provides a full fine-tuned model based on Qwen/Qwen2.5-7B-Instruct using 16-bit precision (BF16).

Training Objective

This model is trained to improve ReAct-style agent performance on ALFWorld (household tasks) and DBBench (database operations).

Training data includes curated trajectories, distilled data from Qwen/Qwen3-32B, and augmented data targeting specific failure patterns.

Training Data

Dataset Count
u-10bei/sft_alfworld_trajectory_dataset_v5 2,502
u-10bei/dbbench_sft_dataset_react_v4 1,200
Distilled (Qwen/Qwen3-32B) 1,200
ALFWorld augmented 215
Recovery loop avoidance 120
No-examine 155
Total 5,392

Training Configuration

  • Base model: Qwen/Qwen2.5-7B-Instruct
  • Precision: 16-bit (BF16)
  • Epochs: 2
  • GPU: A100 80GB

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Sakai0920/LLM-Advanced-Competition-2025"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

Sources & Terms (IMPORTANT)

Base model: Qwen/Qwen2.5-7B-Instruct

Distillation teacher: Qwen/Qwen3-32B

Compliance: Users must comply with the Apache 2.0 license and the base model's original terms of use.