--- base_model: Qwen/Qwen2.5-7B-Instruct datasets: - u-10bei/sft_alfworld_trajectory_dataset_v5 language: - en license: apache-2.0 library_name: transformers pipeline_tag: text-generation tags: - agent - tool-use - alfworld - dbbench --- # qwen25_7b_lora_agentbench_v21 This repository provides a **merged model** fine-tuned from **Qwen/Qwen2.5-7B-Instruct**. The fine-tuning was performed using **LoRA + Unsloth** and the resulting adapter has been merged back into the base model weights. This repository contains **full model weights**, making it ready for inference without the need to load a separate adapter. ## Training Objective This model is optimized for **multi-turn agent tasks**, specifically for ALFWorld (household navigation/interaction) and DBBench (database operations). The training process applied loss to **all assistant turns** in the multi-turn trajectories, allowing the model to learn not just final answers, but also intermediate reasoning (Thought), environment observation processing, action selection, and error recovery. ## Training Configuration - **Base model:** Qwen/Qwen2.5-7B-Instruct - **Method:** LoRA (merged post-training) - **Max sequence length:** 2048 - **Epochs:** 2 - **Learning rate:** 2e-06 - **LoRA Parameters:** r=64, alpha=128 ## Usage This model can be loaded using the standard `transformers` library or deployed with `vLLM` (recommended for evaluation). ### Transformers ```python from transformers import AutoModelForCausalLM, AutoTokenizer import torch model_id = "your_hf_id/your_repo_name" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto", )