moushi21
/

agent-bench-dbbench-merged4

Text Generation

text-generation-inference

Model card Files Files and versions

moushi21 commited on 3 days ago

Commit

aafd798

·

verified ·

1 Parent(s): c6be58a

Update README.md

Files changed (1) hide show

README.md +21 -28

README.md CHANGED Viewed

@@ -8,58 +8,51 @@ datasets:
 language:
 - en
 license: apache-2.0
-library_name: peft
 pipeline_tag: text-generation
 tags:
-- lora
 - agent
 - tool-use
 - dbbench
 ---
-# ＜qwen3-4b-agent-trajectory-lora＞
-This repository provides a **LoRA adapter** fine-tuned from
-**Qwen/Qwen3-4B-Instruct-2507** using **LoRA + Unsloth**.
-This repository contains **LoRA adapter weights only**.
-The base model must be loaded separately.
 ## Training Objective
-This adapter is trained to improve **multi-turn agent task performance**
-on ALFWorld (household tasks) and DBBench (database operations).
-Loss is applied to **all assistant turns** in the multi-turn trajectory,
-enabling the model to learn environment observation, action selection,
-tool use, and recovery from errors.
 ## Training Configuration
-- Base model: Qwen/Qwen3-4B-Instruct-2507
-- Method: LoRA (full precision base)
-- Max sequence length: 4096
-- Epochs: 1
-- Learning rate: 5e-07
-- LoRA: r=64, alpha=128
 ## Usage
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
-from peft import PeftModel
 import torch
-base = "Qwen/Qwen3-4B-Instruct-2507"
-adapter = "your_id/your-repo"
-tokenizer = AutoTokenizer.from_pretrained(base)
 model = AutoModelForCausalLM.from_pretrained(
-    base,
-    torch_dtype=torch.float16,
-    device_map="auto",
 )
-model = PeftModel.from_pretrained(model, adapter)
 ```
 ## Sources & Terms (IMPORTANT)

 language:
 - en
 license: apache-2.0
+library_name: transformers
 pipeline_tag: text-generation
 tags:
+- unsloth
 - agent
 - tool-use
 - dbbench
 ---
+# Qwen3-4B-Agent-DBBench-Specialist
+This repository provides a **merged full-parameter model** (bfloat16) fine-tuned from **Qwen/Qwen3-4B-Instruct-2507**.
+Instead of a standalone LoRA adapter, this model has been created by merging LoRA weights back into the base model using **Unsloth's `merge_and_unload`** method. This ensures high-speed inference and easy deployment.
 ## Training Objective
+This model is specialized for **DBBench trajectory tasks**, trained to handle multi-turn environment observations and action selections.
 ## Training Configuration
+- **Base model**: Qwen/Qwen3-4B-Instruct-2507
+- **Format**: Merged Full Weights (bfloat16)
+- **Method**: LoRA fine-tuning (Merged via Unsloth `merge_and_unload`)
+- **Max sequence length**: 4096
+- **Steps**: 500
+- **Learning rate**: 5e-07
+- **LoRA Parameters during training**: r=64, alpha=128
+- **Platform**: Trained with Unsloth
 ## Usage
+Since this is a merged model, you can load it directly like any other Qwen3 model:
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
+model_id = "moushi21/agent-bench-dbbench-merged4"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
 model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    torch_dtype=torch.bfloat16,
+    device_map="auto"
 )
 ```
 ## Sources & Terms (IMPORTANT)