--- base_model: unsloth/Qwen2.5-7B-Instruct datasets: - u-10bei/sft_alfworld_trajectory_dataset_v5 - u-10bei/dbbench_sft_dataset_react_v4 language: - en license: apache-2.0 library_name: peft pipeline_tag: text-generation tags: - lora - agent - tool-use - alfworld - dbbench --- # Qwen2.5-7B-Agent-Mixed-Trajectory-LoRA This repository provides a merged model fine-tuned from **unsloth/Qwen2.5-7B-Instruct** using **LoRA + Unsloth**. ## Dataset Construction Training data was built by mixing and preprocessing two trajectory datasets: - **ALFWorld** (`u-10bei/sft_alfworld_trajectory_dataset_v5`): 2,327 samples after cleaning - **DBBench** (`u-10bei/dbbench_sft_dataset_react_v4`): 1,200 samples after cleaning Category-level upsampling was applied to reinforce weak task types: | Category | Multiplier | |---|---| | ALFWorld multi-object | ×3 | | ALFWorld cool | ×2 | | ALFWorld examine | ×1.5 | | DBBench aggregation-MAX | ×3 | | DBBench INSERT | ×2 | | DBBench counting | ×2 | Final dataset size: **5,169 samples** ## Training Configuration | Parameter | Value | |---|---| | Base model | unsloth/Qwen2.5-7B-Instruct | | Method | LoRA + Unsloth (Colab Pro A100) | | Max sequence length | 4096 | | Epochs | 3 | | Learning rate | 8e-6 | | LoRA r / alpha | 64 / 128 | | Effective batch size | 16 (bs=4 × grad_accum=4) | ## Sources & Terms Dataset License: MIT License. Users must comply with the MIT license and the base model's original terms of use.