Upload final merged Qwen2.5-7B-Instruct ALF+DBB model

5727d2d verified about 20 hours ago

1.73 kB

	---
	base_model: Qwen/Qwen2.5-7B-Instruct
	datasets:
	- u-10bei/sft_alfworld_trajectory_dataset_v5
	- u-10bei/dbbench_sft_dataset_react_v4
	language:
	- en
	license: apache-2.0
	library_name: peft
	pipeline_tag: text-generation
	tags:
	- lora
	- agent
	- tool-use
	- alfworld
	- dbbench
	---

	# qwen2.5-7b-alf-dbb-merged-final

	This repository provides a merged full model based on
	Qwen/Qwen2.5-7B-Instruct.

	## Model Construction Pipeline

	1. Train LoRA adapter on ALFWorld
	2. Train LoRA adapter on DBBench
	3. Merge adapters using `ties` (density=0.1)
	4. Apply additional stabilization fine-tuning (LoRA)
	5. Merge final adapter into base model

	This repository contains full merged weights (no adapter required).

	## Final Training Configuration

	- Base model: Qwen/Qwen2.5-7B-Instruct
	- Merge method: ties
	- Merge density: 0.1
	- Final stage epochs: 1
	- Learning rate: 1e-05
	- Final LoRA: r=16, alpha=16
	- Max sequence length: 2024


	## Datasets

	- u-10bei/sft_alfworld_trajectory_dataset_v5
	- u-10bei/dbbench_sft_dataset_react_v4

	Additional distilled datasets were optionally included.


	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	model_id = "takayosh/agentbenchfinal"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype="auto",
	device_map="auto"
	)
	```

	## Sources & Terms (IMPORTANT)

	Training data:
	- u-10bei/sft_alfworld_trajectory_dataset_v5
	- u-10bei/dbbench_sft_dataset_react_v4

	Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License.
	Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.