| | --- |
| | base_model: unsloth/Qwen2.5-7B-Instruct |
| | datasets: |
| | - u-10bei/sft_alfworld_trajectory_dataset_v5 |
| | - u-10bei/dbbench_sft_dataset_react_v4 |
| | language: |
| | - en |
| | license: apache-2.0 |
| | library_name: peft |
| | pipeline_tag: text-generation |
| | tags: |
| | - lora |
| | - agent |
| | - tool-use |
| | - alfworld |
| | - dbbench |
| | --- |
| | |
| | # Qwen2.5-7B-Agent-Mixed-Trajectory-LoRA |
| |
|
| | This repository provides a merged model fine-tuned from |
| | **unsloth/Qwen2.5-7B-Instruct** using **LoRA + Unsloth**. |
| |
|
| | ## Dataset Construction |
| |
|
| | Training data was built by mixing and preprocessing two trajectory datasets: |
| | - **ALFWorld** (`u-10bei/sft_alfworld_trajectory_dataset_v5`): 2,327 samples after cleaning |
| | - **DBBench** (`u-10bei/dbbench_sft_dataset_react_v4`): 1,200 samples after cleaning |
| |
|
| | Category-level upsampling was applied to reinforce weak task types: |
| |
|
| | | Category | Multiplier | |
| | |---|---| |
| | | ALFWorld multi-object | ×3 | |
| | | ALFWorld cool | ×2 | |
| | | ALFWorld examine | ×1.5 | |
| | | DBBench aggregation-MAX | ×3 | |
| | | DBBench INSERT | ×2 | |
| | | DBBench counting | ×2 | |
| |
|
| | Final dataset size: **5,169 samples** |
| |
|
| | ## Training Configuration |
| |
|
| | | Parameter | Value | |
| | |---|---| |
| | | Base model | unsloth/Qwen2.5-7B-Instruct | |
| | | Method | LoRA + Unsloth (Colab Pro A100) | |
| | | Max sequence length | 4096 | |
| | | Epochs | 3 | |
| | | Learning rate | 8e-6 | |
| | | LoRA r / alpha | 64 / 128 | |
| | | Effective batch size | 16 (bs=4 × grad_accum=4) | |
| | |
| | ## Sources & Terms |
| | |
| | Dataset License: MIT License. |
| | Users must comply with the MIT license and the base model's original terms of use. |
| | |