moushi21 commited on
Commit
aafd798
·
verified ·
1 Parent(s): c6be58a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -28
README.md CHANGED
@@ -8,58 +8,51 @@ datasets:
8
  language:
9
  - en
10
  license: apache-2.0
11
- library_name: peft
12
  pipeline_tag: text-generation
13
  tags:
14
- - lora
15
  - agent
16
  - tool-use
17
  - dbbench
18
  ---
19
 
20
- # <qwen3-4b-agent-trajectory-lora>
21
 
22
- This repository provides a **LoRA adapter** fine-tuned from
23
- **Qwen/Qwen3-4B-Instruct-2507** using **LoRA + Unsloth**.
24
 
25
- This repository contains **LoRA adapter weights only**.
26
- The base model must be loaded separately.
27
 
28
  ## Training Objective
29
-
30
- This adapter is trained to improve **multi-turn agent task performance**
31
- on ALFWorld (household tasks) and DBBench (database operations).
32
-
33
- Loss is applied to **all assistant turns** in the multi-turn trajectory,
34
- enabling the model to learn environment observation, action selection,
35
- tool use, and recovery from errors.
36
 
37
  ## Training Configuration
38
 
39
- - Base model: Qwen/Qwen3-4B-Instruct-2507
40
- - Method: LoRA (full precision base)
41
- - Max sequence length: 4096
42
- - Epochs: 1
43
- - Learning rate: 5e-07
44
- - LoRA: r=64, alpha=128
 
 
45
 
46
  ## Usage
47
 
 
 
48
  ```python
49
  from transformers import AutoModelForCausalLM, AutoTokenizer
50
- from peft import PeftModel
51
  import torch
52
 
53
- base = "Qwen/Qwen3-4B-Instruct-2507"
54
- adapter = "your_id/your-repo"
55
 
56
- tokenizer = AutoTokenizer.from_pretrained(base)
57
  model = AutoModelForCausalLM.from_pretrained(
58
- base,
59
- torch_dtype=torch.float16,
60
- device_map="auto",
61
  )
62
- model = PeftModel.from_pretrained(model, adapter)
63
  ```
64
 
65
  ## Sources & Terms (IMPORTANT)
 
8
  language:
9
  - en
10
  license: apache-2.0
11
+ library_name: transformers
12
  pipeline_tag: text-generation
13
  tags:
14
+ - unsloth
15
  - agent
16
  - tool-use
17
  - dbbench
18
  ---
19
 
20
+ # Qwen3-4B-Agent-DBBench-Specialist
21
 
22
+ This repository provides a **merged full-parameter model** (bfloat16) fine-tuned from **Qwen/Qwen3-4B-Instruct-2507**.
 
23
 
24
+ Instead of a standalone LoRA adapter, this model has been created by merging LoRA weights back into the base model using **Unsloth's `merge_and_unload`** method. This ensures high-speed inference and easy deployment.
 
25
 
26
  ## Training Objective
27
+ This model is specialized for **DBBench trajectory tasks**, trained to handle multi-turn environment observations and action selections.
 
 
 
 
 
 
28
 
29
  ## Training Configuration
30
 
31
+ - **Base model**: Qwen/Qwen3-4B-Instruct-2507
32
+ - **Format**: Merged Full Weights (bfloat16)
33
+ - **Method**: LoRA fine-tuning (Merged via Unsloth `merge_and_unload`)
34
+ - **Max sequence length**: 4096
35
+ - **Steps**: 500
36
+ - **Learning rate**: 5e-07
37
+ - **LoRA Parameters during training**: r=64, alpha=128
38
+ - **Platform**: Trained with Unsloth
39
 
40
  ## Usage
41
 
42
+ Since this is a merged model, you can load it directly like any other Qwen3 model:
43
+
44
  ```python
45
  from transformers import AutoModelForCausalLM, AutoTokenizer
 
46
  import torch
47
 
48
+ model_id = "moushi21/agent-bench-dbbench-merged4"
 
49
 
50
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
51
  model = AutoModelForCausalLM.from_pretrained(
52
+ model_id,
53
+ torch_dtype=torch.bfloat16,
54
+ device_map="auto"
55
  )
 
56
  ```
57
 
58
  ## Sources & Terms (IMPORTANT)