| | --- |
| | base_model: Qwen/Qwen3-4B-Instruct-2507 |
| | datasets: |
| | - u-10bei/dbbench_sft_dataset_react |
| | - u-10bei/dbbench_sft_dataset_react_v2 |
| | - u-10bei/dbbench_sft_dataset_react_v3 |
| | - u-10bei/dbbench_sft_dataset_react_v4 |
| | language: |
| | - en |
| | license: apache-2.0 |
| | library_name: transformers |
| | pipeline_tag: text-generation |
| | tags: |
| | - unsloth |
| | - agent |
| | - tool-use |
| | - dbbench |
| | --- |
| | |
| | # Qwen3-4B-Agent-DBBench-Specialist |
| |
|
| | This repository provides a **merged full-parameter model** (bfloat16) fine-tuned from **Qwen/Qwen3-4B-Instruct-2507**. |
| |
|
| | Instead of a standalone LoRA adapter, this model has been created by merging LoRA weights back into the base model using **Unsloth's `merge_and_unload`** method. This ensures high-speed inference and easy deployment. |
| |
|
| | ## Training Objective |
| | This model is specialized for **DBBench trajectory tasks**, trained to handle multi-turn environment observations and action selections. |
| |
|
| | ## Training Configuration |
| |
|
| | - **Base model**: Qwen/Qwen3-4B-Instruct-2507 |
| | - **Format**: Merged Full Weights (bfloat16) |
| | - **Method**: LoRA fine-tuning (Merged via Unsloth `merge_and_unload`) |
| | - **Max sequence length**: 4096 |
| | - **Steps**: 500 |
| | - **Learning rate**: 5e-07 |
| | - **LoRA Parameters during training**: r=64, alpha=128 |
| | - **Platform**: Trained with Unsloth |
| |
|
| | ## Usage |
| |
|
| | Since this is a merged model, you can load it directly like any other Qwen3 model: |
| |
|
| | ```python |
| | from transformers import AutoModelForCausalLM, AutoTokenizer |
| | import torch |
| | |
| | model_id = "moushi21/agent-bench-dbbench-merged4" |
| | |
| | tokenizer = AutoTokenizer.from_pretrained(model_id) |
| | model = AutoModelForCausalLM.from_pretrained( |
| | model_id, |
| | torch_dtype=torch.bfloat16, |
| | device_map="auto" |
| | ) |
| | ``` |
| |
|
| | ## Sources & Terms (IMPORTANT) |
| |
|
| | Training data: |
| | - u-10bei/dbbench_sft_dataset_react |
| | - u-10bei/dbbench_sft_dataset_react_v2 |
| | - u-10bei/dbbench_sft_dataset_react_v3 |
| | - u-10bei/dbbench_sft_dataset_react_v4 |
| | |
| | Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License. |
| | Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use. |
| | |