--- base_model: Qwen/Qwen2.5-7B-Instruct datasets: - u-10bei/dbbench_sft_dataset_react_v4 - u-10bei/sft_alfworld_trajectory_dataset_v5 language: - en license: apache-2.0 library_name: peft pipeline_tag: text-generation tags: - lora - agent - tool-use - alfworld - dbbench --- # qwen2.5-7b-Instruct-trajectory-lora-second This repository provides a **LoRA adapter** fine-tuned from **Qwen/Qwen2.5-7B-Instruct** using **LoRA + Unsloth**. This repository contains **LoRA adapter weights only**. The base model must be loaded separately. ## Training Objective This adapter is trained to improve **multi-turn agent task performance** on ALFWorld (household tasks) and DBBench (database operations). Loss is applied to **all assistant turns** in the multi-turn trajectory, enabling the model to learn environment observation, action selection, tool use, and recovery from errors. ## Training Configuration - Base model: Qwen/Qwen2.5-7B-Instruct - Method: LoRA (full precision base) - Max sequence length: 2048 - Epochs: 2 - Learning rate: 1e-05 - LoRA: r=64, alpha=128 ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel import torch base = "Qwen/Qwen2.5-7B-Instruct" adapter = "your_id/your-repo" tokenizer = AutoTokenizer.from_pretrained(base) model = AutoModelForCausalLM.from_pretrained( base, torch_dtype=torch.float16, device_map="auto", ) model = PeftModel.from_pretrained(model, adapter) ``` ## Sources & Terms (IMPORTANT) Training data: - This dataset is constructed by merging the following publicly available dataset on Hugging Face: -https://huggingface.co/datasets/u-10bei/dbbench_sft_dataset_react_v4 -https://huggingface.co/datasets/u-10bei/sft_alfworld_trajectory_dataset_v5 - Marged Ratio: 6:4 (DBbench:ALFWorld) - Reinformatted into unified message format - No additional annotation added - No semantic modification of original contents - This dataset isa derived work from datasets released under the MIT lisence. - The original datasets are also distributed under the MIT license. - All rigihts belong to the original authors and licensors. Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License. Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.