Qwen2.5-7B-Agent-Mixed-Trajectory-LoRA
This repository provides a merged model fine-tuned from unsloth/Qwen2.5-7B-Instruct using LoRA + Unsloth.
Dataset Construction
Training data was built by mixing and preprocessing two trajectory datasets:
- ALFWorld (
u-10bei/sft_alfworld_trajectory_dataset_v5): 2,327 samples after cleaning - DBBench (
u-10bei/dbbench_sft_dataset_react_v4): 1,200 samples after cleaning
Category-level upsampling was applied to reinforce weak task types:
| Category | Multiplier |
|---|---|
| ALFWorld multi-object | ×3 |
| ALFWorld cool | ×2 |
| ALFWorld examine | ×1.5 |
| DBBench aggregation-MAX | ×3 |
| DBBench INSERT | ×2 |
| DBBench counting | ×2 |
Final dataset size: 5,169 samples
Training Configuration
| Parameter | Value |
|---|---|
| Base model | unsloth/Qwen2.5-7B-Instruct |
| Method | LoRA + Unsloth (Colab Pro A100) |
| Max sequence length | 4096 |
| Epochs | 3 |
| Learning rate | 8e-6 |
| LoRA r / alpha | 64 / 128 |
| Effective batch size | 16 (bs=4 × grad_accum=4) |
Sources & Terms
Dataset License: MIT License. Users must comply with the MIT license and the base model's original terms of use.
- Downloads last month
- 33