UtsuSl0th
/

mixed-lora-1

Text Generation

Model card Files Files and versions

mixed-lora-1 / README.md

UtsuSl0th's picture

Add model card README

226834f verified 9 days ago

|

history blame contribute delete

1.47 kB

	---
	base_model: unsloth/Qwen2.5-7B-Instruct
	datasets:
	- u-10bei/sft_alfworld_trajectory_dataset_v5
	- u-10bei/dbbench_sft_dataset_react_v4
	language:
	- en
	license: apache-2.0
	library_name: peft
	pipeline_tag: text-generation
	tags:
	- lora
	- agent
	- tool-use
	- alfworld
	- dbbench
	---

	# Qwen2.5-7B-Agent-Mixed-Trajectory-LoRA

	This repository provides a merged model fine-tuned from
	unsloth/Qwen2.5-7B-Instruct using LoRA + Unsloth.

	## Dataset Construction

	Training data was built by mixing and preprocessing two trajectory datasets:
	- ALFWorld (`u-10bei/sft_alfworld_trajectory_dataset_v5`): 2,327 samples after cleaning
	- DBBench (`u-10bei/dbbench_sft_dataset_react_v4`): 1,200 samples after cleaning

	Category-level upsampling was applied to reinforce weak task types:

	\| Category \| Multiplier \|
	\|---\|---\|
	\| ALFWorld multi-object \| ×3 \|
	\| ALFWorld cool \| ×2 \|
	\| ALFWorld examine \| ×1.5 \|
	\| DBBench aggregation-MAX \| ×3 \|
	\| DBBench INSERT \| ×2 \|
	\| DBBench counting \| ×2 \|

	Final dataset size: 5,169 samples

	## Training Configuration

	\| Parameter \| Value \|
	\|---\|---\|
	\| Base model \| unsloth/Qwen2.5-7B-Instruct \|
	\| Method \| LoRA + Unsloth (Colab Pro A100) \|
	\| Max sequence length \| 4096 \|
	\| Epochs \| 3 \|
	\| Learning rate \| 8e-6 \|
	\| LoRA r / alpha \| 64 / 128 \|
	\| Effective batch size \| 16 (bs=4 × grad_accum=4) \|

	## Sources & Terms

	Dataset License: MIT License.
	Users must comply with the MIT license and the base model's original terms of use.