da1ch812 commited on
Commit
08c7c27
·
verified ·
1 Parent(s): 7ac893d

Upload merged Qwen3-4B-Instruct-2507 model (auto-generated README)

Browse files
Files changed (1) hide show
  1. README.md +68 -0
README.md ADDED
@@ -0,0 +1,68 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen3-4B-Instruct-2507
3
+ datasets:
4
+ - ALFWorld1
5
+ language:
6
+ - en
7
+ license: apache-2.0
8
+ library_name: peft
9
+ pipeline_tag: text-generation
10
+ tags:
11
+ - lora
12
+ - agent
13
+ - tool-use
14
+ - alfworld
15
+ - dbbench
16
+ ---
17
+
18
+ # <qwen3-4b-agent-trajectory-lora>
19
+
20
+ This repository provides a **LoRA adapter** fine-tuned from
21
+ **Qwen/Qwen3-4B-Instruct-2507** using **LoRA + Unsloth**.
22
+
23
+ This repository contains **LoRA adapter weights only**.
24
+ The base model must be loaded separately.
25
+
26
+ ## Training Objective
27
+
28
+ This adapter is trained to improve **multi-turn agent task performance**
29
+ on ALFWorld (household tasks) and DBBench (database operations).
30
+
31
+ Loss is applied to **all assistant turns** in the multi-turn trajectory,
32
+ enabling the model to learn environment observation, action selection,
33
+ tool use, and recovery from errors.
34
+
35
+ ## Training Configuration
36
+
37
+ - Base model: Qwen/Qwen3-4B-Instruct-2507
38
+ - Method: LoRA (full precision base)
39
+ - Max sequence length: 2048
40
+ - Epochs: 2
41
+ - Learning rate: 2e-06
42
+ - LoRA: r=64, alpha=128
43
+
44
+ ## Usage
45
+
46
+ ```python
47
+ from transformers import AutoModelForCausalLM, AutoTokenizer
48
+ from peft import PeftModel
49
+ import torch
50
+
51
+ base = "Qwen/Qwen3-4B-Instruct-2507"
52
+ adapter = "your_id/your-repo"
53
+
54
+ tokenizer = AutoTokenizer.from_pretrained(base)
55
+ model = AutoModelForCausalLM.from_pretrained(
56
+ base,
57
+ torch_dtype=torch.float16,
58
+ device_map="auto",
59
+ )
60
+ model = PeftModel.from_pretrained(model, adapter)
61
+ ```
62
+
63
+ ## Sources & Terms (IMPORTANT)
64
+
65
+ Training data: ALFWorld1
66
+
67
+ Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License.
68
+ Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.