---
base_model: unsloth/Qwen2.5-7B-Instruct
datasets:
- u-10bei/sft_alfworld_trajectory_dataset_v5
- u-10bei/dbbench_sft_dataset_react_v4
language:
- en
license: apache-2.0
library_name: peft
pipeline_tag: text-generation
tags:
- lora
- agent
- tool-use
- alfworld
- dbbench
---

# Qwen2.5-7B-Agent-Mixed-Trajectory-LoRA

This repository provides a merged model fine-tuned from
**unsloth/Qwen2.5-7B-Instruct** using **LoRA + Unsloth**.

## Dataset Construction

Training data was built by mixing and preprocessing two trajectory datasets:
- **ALFWorld** (`u-10bei/sft_alfworld_trajectory_dataset_v5`): 2,327 samples after cleaning
- **DBBench** (`u-10bei/dbbench_sft_dataset_react_v4`): 1,200 samples after cleaning

Category-level upsampling was applied to reinforce weak task types:

| Category | Multiplier |
|---|---|
| ALFWorld multi-object | ×3 |
| ALFWorld cool | ×2 |
| ALFWorld examine | ×1.5 |
| DBBench aggregation-MAX | ×3 |
| DBBench INSERT | ×2 |
| DBBench counting | ×2 |

Final dataset size: **5,169 samples**

## Training Configuration

| Parameter | Value |
|---|---|
| Base model | unsloth/Qwen2.5-7B-Instruct |
| Method | LoRA + Unsloth (Colab Pro A100) |
| Max sequence length | 4096 |
| Epochs | 3 |
| Learning rate | 8e-6 |
| LoRA r / alpha | 64 / 128 |
| Effective batch size | 16 (bs=4 × grad_accum=4) |

## Sources & Terms

Dataset License: MIT License.
Users must comply with the MIT license and the base model's original terms of use.