Upload merged Qwen3-4B-Instruct-2507 model (auto-generated README)
Browse files
README.md
CHANGED
|
@@ -1,6 +1,7 @@
|
|
| 1 |
---
|
| 2 |
base_model: Qwen/Qwen3-4B-Instruct-2507
|
| 3 |
datasets:
|
|
|
|
| 4 |
- u-10bei/sft_alfworld_trajectory_dataset_v5
|
| 5 |
language:
|
| 6 |
- en
|
|
@@ -9,37 +10,67 @@ library_name: peft
|
|
| 9 |
pipeline_tag: text-generation
|
| 10 |
tags:
|
| 11 |
- lora
|
|
|
|
|
|
|
| 12 |
- agent
|
| 13 |
- tool-use
|
|
|
|
| 14 |
- alfworld
|
| 15 |
- dbbench
|
|
|
|
| 16 |
---
|
| 17 |
|
| 18 |
-
# qwen3-4b-
|
| 19 |
|
| 20 |
-
This repository provides a **LoRA adapter** fine-tuned from
|
| 21 |
-
**
|
| 22 |
|
| 23 |
This repository contains **LoRA adapter weights only**.
|
| 24 |
The base model must be loaded separately.
|
| 25 |
|
| 26 |
## Training Objective
|
| 27 |
|
| 28 |
-
This adapter is trained to improve **multi-turn agent task performance**
|
| 29 |
-
on ALFWorld (household tasks) and DBBench (database operations).
|
| 30 |
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
| 35 |
## Training Configuration
|
| 36 |
|
| 37 |
- Base model: Qwen/Qwen3-4B-Instruct-2507
|
| 38 |
- Method: LoRA (full precision base)
|
| 39 |
- Max sequence length: 2048
|
| 40 |
-
- Epochs:
|
| 41 |
-
- Learning rate:
|
| 42 |
-
- LoRA: r=64, alpha=128
|
|
|
|
|
|
|
| 43 |
|
| 44 |
## Usage
|
| 45 |
|
|
@@ -60,9 +91,22 @@ model = AutoModelForCausalLM.from_pretrained(
|
|
| 60 |
model = PeftModel.from_pretrained(model, adapter)
|
| 61 |
```
|
| 62 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
## Sources & Terms (IMPORTANT)
|
| 64 |
|
| 65 |
-
Training data:
|
|
|
|
|
|
|
| 66 |
|
| 67 |
-
Dataset
|
| 68 |
-
|
|
|
|
|
|
| 1 |
---
|
| 2 |
base_model: Qwen/Qwen3-4B-Instruct-2507
|
| 3 |
datasets:
|
| 4 |
+
- u-10bei/dbbench_sft_dataset_react_v4
|
| 5 |
- u-10bei/sft_alfworld_trajectory_dataset_v5
|
| 6 |
language:
|
| 7 |
- en
|
|
|
|
| 10 |
pipeline_tag: text-generation
|
| 11 |
tags:
|
| 12 |
- lora
|
| 13 |
+
- peft
|
| 14 |
+
- unsloth
|
| 15 |
- agent
|
| 16 |
- tool-use
|
| 17 |
+
- agentbench
|
| 18 |
- alfworld
|
| 19 |
- dbbench
|
| 20 |
+
- db-oversampling
|
| 21 |
---
|
| 22 |
|
| 23 |
+
# qwen3-4b-agentbench-dbalf-lora
|
| 24 |
|
| 25 |
+
This repository provides a **LoRA adapter** fine-tuned from **Qwen/Qwen3-4B-Instruct-2507**
|
| 26 |
+
using **LoRA + Unsloth** for **AgentBench-style multi-turn agent trajectories**.
|
| 27 |
|
| 28 |
This repository contains **LoRA adapter weights only**.
|
| 29 |
The base model must be loaded separately.
|
| 30 |
|
| 31 |
## Training Objective
|
| 32 |
|
| 33 |
+
This adapter is trained to improve **multi-turn agent task performance** on:
|
|
|
|
| 34 |
|
| 35 |
+
- **DBBench** (database operation / SQL generation trajectories)
|
| 36 |
+
- **ALFWorld** (household task trajectories)
|
| 37 |
+
|
| 38 |
+
Loss is applied to **all assistant turns** in the trajectory, enabling the model to learn:
|
| 39 |
+
|
| 40 |
+
- environment observation
|
| 41 |
+
- action selection
|
| 42 |
+
- tool use / operation formatting
|
| 43 |
+
- recovery from intermediate errors
|
| 44 |
+
|
| 45 |
+
## Training Data
|
| 46 |
+
|
| 47 |
+
- DBBench dataset: `u-10bei/dbbench_sft_dataset_react_v4`
|
| 48 |
+
- ALFWorld dataset: `u-10bei/sft_alfworld_trajectory_dataset_v5`
|
| 49 |
+
- Mixing ratio (pre-merge target): **DB:ALF = 1:1**
|
| 50 |
+
|
| 51 |
+
### DB Oversampling (category-aware)
|
| 52 |
+
Enabled: **True**
|
| 53 |
+
|
| 54 |
+
DB category weights used during training-data preparation:
|
| 55 |
+
|
| 56 |
+
- counting: 6
|
| 57 |
+
- comparison: 4
|
| 58 |
+
- ranking: 2
|
| 59 |
+
- select: 1
|
| 60 |
+
- insert: 1
|
| 61 |
+
- update: 1
|
| 62 |
+
- other: 1
|
| 63 |
|
| 64 |
## Training Configuration
|
| 65 |
|
| 66 |
- Base model: Qwen/Qwen3-4B-Instruct-2507
|
| 67 |
- Method: LoRA (full precision base)
|
| 68 |
- Max sequence length: 2048
|
| 69 |
+
- Epochs: 1
|
| 70 |
+
- Learning rate: 1e-06
|
| 71 |
+
- LoRA: r=64, alpha=128, dropout=0.0
|
| 72 |
+
- Per-device train batch size: 2
|
| 73 |
+
- Gradient accumulation: 4
|
| 74 |
|
| 75 |
## Usage
|
| 76 |
|
|
|
|
| 91 |
model = PeftModel.from_pretrained(model, adapter)
|
| 92 |
```
|
| 93 |
|
| 94 |
+
## Notes
|
| 95 |
+
|
| 96 |
+
- This repository is intended for **adapter-only** distribution.
|
| 97 |
+
- Please ensure compliance with the **base model license/terms** in addition to this repository's license.
|
| 98 |
+
- If you publish evaluation results, it is recommended to report:
|
| 99 |
+
- AgentBench task split / seeds
|
| 100 |
+
- DBBench / ALFWorld mix ratio
|
| 101 |
+
- DB oversampling settings
|
| 102 |
+
- decoding settings
|
| 103 |
+
|
| 104 |
## Sources & Terms (IMPORTANT)
|
| 105 |
|
| 106 |
+
Training data:
|
| 107 |
+
- u-10bei/dbbench_sft_dataset_react_v4
|
| 108 |
+
- u-10bei/sft_alfworld_trajectory_dataset_v5
|
| 109 |
|
| 110 |
+
Dataset license / terms:
|
| 111 |
+
- Please follow the original license and terms of each dataset repository.
|
| 112 |
+
- This adapter repository license (**apache-2.0**) applies to the adapter files in this repository.
|