mixed-lora-1 / README.md
UtsuSl0th's picture
Add model card README
226834f verified
---
base_model: unsloth/Qwen2.5-7B-Instruct
datasets:
- u-10bei/sft_alfworld_trajectory_dataset_v5
- u-10bei/dbbench_sft_dataset_react_v4
language:
- en
license: apache-2.0
library_name: peft
pipeline_tag: text-generation
tags:
- lora
- agent
- tool-use
- alfworld
- dbbench
---
# Qwen2.5-7B-Agent-Mixed-Trajectory-LoRA
This repository provides a merged model fine-tuned from
**unsloth/Qwen2.5-7B-Instruct** using **LoRA + Unsloth**.
## Dataset Construction
Training data was built by mixing and preprocessing two trajectory datasets:
- **ALFWorld** (`u-10bei/sft_alfworld_trajectory_dataset_v5`): 2,327 samples after cleaning
- **DBBench** (`u-10bei/dbbench_sft_dataset_react_v4`): 1,200 samples after cleaning
Category-level upsampling was applied to reinforce weak task types:
| Category | Multiplier |
|---|---|
| ALFWorld multi-object | ×3 |
| ALFWorld cool | ×2 |
| ALFWorld examine | ×1.5 |
| DBBench aggregation-MAX | ×3 |
| DBBench INSERT | ×2 |
| DBBench counting | ×2 |
Final dataset size: **5,169 samples**
## Training Configuration
| Parameter | Value |
|---|---|
| Base model | unsloth/Qwen2.5-7B-Instruct |
| Method | LoRA + Unsloth (Colab Pro A100) |
| Max sequence length | 4096 |
| Epochs | 3 |
| Learning rate | 8e-6 |
| LoRA r / alpha | 64 / 128 |
| Effective batch size | 16 (bs=4 × grad_accum=4) |
## Sources & Terms
Dataset License: MIT License.
Users must comply with the MIT license and the base model's original terms of use.