AF0815 commited on
Commit
d928909
·
verified ·
1 Parent(s): 3327c76

Upload merged Qwen3-4B-Instruct-2507 model (auto-generated README)

Browse files
Files changed (1) hide show
  1. README.md +58 -14
README.md CHANGED
@@ -1,6 +1,7 @@
1
  ---
2
  base_model: Qwen/Qwen3-4B-Instruct-2507
3
  datasets:
 
4
  - u-10bei/sft_alfworld_trajectory_dataset_v5
5
  language:
6
  - en
@@ -9,37 +10,67 @@ library_name: peft
9
  pipeline_tag: text-generation
10
  tags:
11
  - lora
 
 
12
  - agent
13
  - tool-use
 
14
  - alfworld
15
  - dbbench
 
16
  ---
17
 
18
- # qwen3-4b-agent-trajectory-lora
19
 
20
- This repository provides a **LoRA adapter** fine-tuned from
21
- **Qwen/Qwen3-4B-Instruct-2507** using **LoRA + Unsloth**.
22
 
23
  This repository contains **LoRA adapter weights only**.
24
  The base model must be loaded separately.
25
 
26
  ## Training Objective
27
 
28
- This adapter is trained to improve **multi-turn agent task performance**
29
- on ALFWorld (household tasks) and DBBench (database operations).
30
 
31
- Loss is applied to **all assistant turns** in the multi-turn trajectory,
32
- enabling the model to learn environment observation, action selection,
33
- tool use, and recovery from errors.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
  ## Training Configuration
36
 
37
  - Base model: Qwen/Qwen3-4B-Instruct-2507
38
  - Method: LoRA (full precision base)
39
  - Max sequence length: 2048
40
- - Epochs: 2
41
- - Learning rate: 2e-06
42
- - LoRA: r=64, alpha=128
 
 
43
 
44
  ## Usage
45
 
@@ -60,9 +91,22 @@ model = AutoModelForCausalLM.from_pretrained(
60
  model = PeftModel.from_pretrained(model, adapter)
61
  ```
62
 
 
 
 
 
 
 
 
 
 
 
63
  ## Sources & Terms (IMPORTANT)
64
 
65
- Training data: u-10bei/sft_alfworld_trajectory_dataset_v5
 
 
66
 
67
- Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License.
68
- Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.
 
 
1
  ---
2
  base_model: Qwen/Qwen3-4B-Instruct-2507
3
  datasets:
4
+ - u-10bei/dbbench_sft_dataset_react_v4
5
  - u-10bei/sft_alfworld_trajectory_dataset_v5
6
  language:
7
  - en
 
10
  pipeline_tag: text-generation
11
  tags:
12
  - lora
13
+ - peft
14
+ - unsloth
15
  - agent
16
  - tool-use
17
+ - agentbench
18
  - alfworld
19
  - dbbench
20
+ - db-oversampling
21
  ---
22
 
23
+ # qwen3-4b-agentbench-dbalf-lora
24
 
25
+ This repository provides a **LoRA adapter** fine-tuned from **Qwen/Qwen3-4B-Instruct-2507**
26
+ using **LoRA + Unsloth** for **AgentBench-style multi-turn agent trajectories**.
27
 
28
  This repository contains **LoRA adapter weights only**.
29
  The base model must be loaded separately.
30
 
31
  ## Training Objective
32
 
33
+ This adapter is trained to improve **multi-turn agent task performance** on:
 
34
 
35
+ - **DBBench** (database operation / SQL generation trajectories)
36
+ - **ALFWorld** (household task trajectories)
37
+
38
+ Loss is applied to **all assistant turns** in the trajectory, enabling the model to learn:
39
+
40
+ - environment observation
41
+ - action selection
42
+ - tool use / operation formatting
43
+ - recovery from intermediate errors
44
+
45
+ ## Training Data
46
+
47
+ - DBBench dataset: `u-10bei/dbbench_sft_dataset_react_v4`
48
+ - ALFWorld dataset: `u-10bei/sft_alfworld_trajectory_dataset_v5`
49
+ - Mixing ratio (pre-merge target): **DB:ALF = 1:1**
50
+
51
+ ### DB Oversampling (category-aware)
52
+ Enabled: **True**
53
+
54
+ DB category weights used during training-data preparation:
55
+
56
+ - counting: 6
57
+ - comparison: 4
58
+ - ranking: 2
59
+ - select: 1
60
+ - insert: 1
61
+ - update: 1
62
+ - other: 1
63
 
64
  ## Training Configuration
65
 
66
  - Base model: Qwen/Qwen3-4B-Instruct-2507
67
  - Method: LoRA (full precision base)
68
  - Max sequence length: 2048
69
+ - Epochs: 1
70
+ - Learning rate: 1e-06
71
+ - LoRA: r=64, alpha=128, dropout=0.0
72
+ - Per-device train batch size: 2
73
+ - Gradient accumulation: 4
74
 
75
  ## Usage
76
 
 
91
  model = PeftModel.from_pretrained(model, adapter)
92
  ```
93
 
94
+ ## Notes
95
+
96
+ - This repository is intended for **adapter-only** distribution.
97
+ - Please ensure compliance with the **base model license/terms** in addition to this repository's license.
98
+ - If you publish evaluation results, it is recommended to report:
99
+ - AgentBench task split / seeds
100
+ - DBBench / ALFWorld mix ratio
101
+ - DB oversampling settings
102
+ - decoding settings
103
+
104
  ## Sources & Terms (IMPORTANT)
105
 
106
+ Training data:
107
+ - u-10bei/dbbench_sft_dataset_react_v4
108
+ - u-10bei/sft_alfworld_trajectory_dataset_v5
109
 
110
+ Dataset license / terms:
111
+ - Please follow the original license and terms of each dataset repository.
112
+ - This adapter repository license (**apache-2.0**) applies to the adapter files in this repository.