Qwen2.5-7B-Agent-Mixed-Trajectory-LoRA

This repository provides a merged model fine-tuned from unsloth/Qwen2.5-7B-Instruct using LoRA + Unsloth.

Dataset Construction

Training data was built by mixing and preprocessing two trajectory datasets:

  • ALFWorld (u-10bei/sft_alfworld_trajectory_dataset_v5): 2,327 samples after cleaning
  • DBBench (u-10bei/dbbench_sft_dataset_react_v4): 1,200 samples after cleaning

Category-level upsampling was applied to reinforce weak task types:

Category Multiplier
ALFWorld multi-object ×3
ALFWorld cool ×2
ALFWorld examine ×1.5
DBBench aggregation-MAX ×3
DBBench INSERT ×2
DBBench counting ×2

Final dataset size: 5,169 samples

Training Configuration

Parameter Value
Base model unsloth/Qwen2.5-7B-Instruct
Method LoRA + Unsloth (Colab Pro A100)
Max sequence length 4096
Epochs 3
Learning rate 8e-6
LoRA r / alpha 64 / 128
Effective batch size 16 (bs=4 × grad_accum=4)

Sources & Terms

Dataset License: MIT License. Users must comply with the MIT license and the base model's original terms of use.

Downloads last month
33
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for UtsuSl0th/mixed-lora-1

Base model

Qwen/Qwen2.5-7B
Adapter
(298)
this model

Datasets used to train UtsuSl0th/mixed-lora-1