moushi21
/

agent-bench-dbbench-merged4

Text Generation

text-generation-inference

Model card Files Files and versions

agent-bench-dbbench-merged4 / README.md

moushi21's picture

Update README.md

aafd798 verified 9 days ago

|

history blame contribute delete

2.07 kB

	---
	base_model: Qwen/Qwen3-4B-Instruct-2507
	datasets:
	- u-10bei/dbbench_sft_dataset_react
	- u-10bei/dbbench_sft_dataset_react_v2
	- u-10bei/dbbench_sft_dataset_react_v3
	- u-10bei/dbbench_sft_dataset_react_v4
	language:
	- en
	license: apache-2.0
	library_name: transformers
	pipeline_tag: text-generation
	tags:
	- unsloth
	- agent
	- tool-use
	- dbbench
	---

	# Qwen3-4B-Agent-DBBench-Specialist

	This repository provides a merged full-parameter model (bfloat16) fine-tuned from Qwen/Qwen3-4B-Instruct-2507.

	Instead of a standalone LoRA adapter, this model has been created by merging LoRA weights back into the base model using Unsloth's `merge_and_unload` method. This ensures high-speed inference and easy deployment.

	## Training Objective
	This model is specialized for DBBench trajectory tasks, trained to handle multi-turn environment observations and action selections.

	## Training Configuration

	- Base model: Qwen/Qwen3-4B-Instruct-2507
	- Format: Merged Full Weights (bfloat16)
	- Method: LoRA fine-tuning (Merged via Unsloth `merge_and_unload`)
	- Max sequence length: 4096
	- Steps: 500
	- Learning rate: 5e-07
	- LoRA Parameters during training: r=64, alpha=128
	- Platform: Trained with Unsloth

	## Usage

	Since this is a merged model, you can load it directly like any other Qwen3 model:

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	model_id = "moushi21/agent-bench-dbbench-merged4"

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForCausalLM.from_pretrained(
	model_id,
	torch_dtype=torch.bfloat16,
	device_map="auto"
	)
	```

	## Sources & Terms (IMPORTANT)

	Training data:
	- u-10bei/dbbench_sft_dataset_react
	- u-10bei/dbbench_sft_dataset_react_v2
	- u-10bei/dbbench_sft_dataset_react_v3
	- u-10bei/dbbench_sft_dataset_react_v4

	Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License.
	Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.