UtsuSl0th commited on
Commit
96f9f30
·
verified ·
1 Parent(s): d57722d

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +50 -11
README.md CHANGED
@@ -2,6 +2,7 @@
2
  base_model: unsloth/Qwen2.5-7B-Instruct
3
  datasets:
4
  - u-10bei/sft_alfworld_trajectory_dataset_v5
 
5
  language:
6
  - en
7
  license: apache-2.0
@@ -15,7 +16,7 @@ tags:
15
  - dbbench
16
  ---
17
 
18
- # <【課題】ここは自分で記入して下さい
19
 
20
  This repository provides a **LoRA adapter** fine-tuned from
21
  **unsloth/Qwen2.5-7B-Instruct** using **LoRA + Unsloth**.
@@ -32,14 +33,50 @@ Loss is applied to **all assistant turns** in the multi-turn trajectory,
32
  enabling the model to learn environment observation, action selection,
33
  tool use, and recovery from errors.
34
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
35
  ## Training Configuration
36
 
37
- - Base model: unsloth/Qwen2.5-7B-Instruct
38
- - Method: LoRA (full precision base)
39
- - Max sequence length: 4096
40
- - Epochs: 3
41
- - Learning rate: 8e-06
42
- - LoRA: r=64, alpha=128
 
 
 
 
 
 
 
 
 
 
43
 
44
  ## Usage
45
 
@@ -49,12 +86,12 @@ from peft import PeftModel
49
  import torch
50
 
51
  base = "unsloth/Qwen2.5-7B-Instruct"
52
- adapter = "your_id/your-repo"
53
 
54
  tokenizer = AutoTokenizer.from_pretrained(base)
55
  model = AutoModelForCausalLM.from_pretrained(
56
  base,
57
- torch_dtype=torch.float16,
58
  device_map="auto",
59
  )
60
  model = PeftModel.from_pretrained(model, adapter)
@@ -62,7 +99,9 @@ model = PeftModel.from_pretrained(model, adapter)
62
 
63
  ## Sources & Terms (IMPORTANT)
64
 
65
- Training data: u-10bei/sft_alfworld_trajectory_dataset_v5
 
 
66
 
67
- Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License.
68
  Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.
 
2
  base_model: unsloth/Qwen2.5-7B-Instruct
3
  datasets:
4
  - u-10bei/sft_alfworld_trajectory_dataset_v5
5
+ - u-10bei/dbbench_sft_dataset_react_v4
6
  language:
7
  - en
8
  license: apache-2.0
 
16
  - dbbench
17
  ---
18
 
19
+ # <Qwen2.5-7B-Agent-Mixed-Trajectory-LoRA
20
 
21
  This repository provides a **LoRA adapter** fine-tuned from
22
  **unsloth/Qwen2.5-7B-Instruct** using **LoRA + Unsloth**.
 
33
  enabling the model to learn environment observation, action selection,
34
  tool use, and recovery from errors.
35
 
36
+ ## Dataset Construction
37
+
38
+ Training data was built by mixing and preprocessing two trajectory datasets:
39
+
40
+ - **ALFWorld** (`u-10bei/sft_alfworld_trajectory_dataset_v5`): 2,327 samples after cleaning
41
+ - **DBBench** (`u-10bei/dbbench_sft_dataset_react_v4`): 1,200 samples after cleaning
42
+
43
+ Preprocessing steps applied:
44
+ 1. Structural validation (removes empty / single-turn samples)
45
+ 2. Chat template tag contamination removal (`htags` pattern)
46
+ 3. Hallucinated object ID removal — ALFWorld only (e.g. `bowl 99`)
47
+
48
+ Category-level upsampling was applied to reinforce weak task types
49
+ identified from evaluation results of a prior model:
50
+
51
+ | Category | Multiplier | Reason |
52
+ |---|---|---|
53
+ | ALFWorld multi-object | ×3 | 0% success rate in prior eval |
54
+ | ALFWorld cool | ×2 | 12% success rate |
55
+ | ALFWorld examine | ×1.5 | 12% success rate |
56
+ | DBBench aggregation-MAX | ×3 | 17% accuracy in prior eval |
57
+ | DBBench INSERT | ×2 | 32% accuracy |
58
+ | DBBench counting | ×2 | 36% accuracy |
59
+
60
+ Final dataset size after mixing and upsampling: **5,169 samples**
61
+
62
  ## Training Configuration
63
 
64
+ | Parameter | Value |
65
+ |---|---|
66
+ | Base model | unsloth/Qwen2.5-7B-Instruct |
67
+ | Method | LoRA + Unsloth (Colab Pro A100) |
68
+ | Max sequence length | 4096 |
69
+ | Epochs | 3 |
70
+ | Learning rate | 8e-6 |
71
+ | LoRA r | 64 |
72
+ | LoRA alpha | 128 |
73
+ | LoRA dropout | 0 |
74
+ | LoRA target modules | q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj |
75
+ | Per-device batch size | 4 |
76
+ | Gradient accumulation | 4 (effective batch size: 16) |
77
+ | Warmup ratio | 0.1 |
78
+ | Weight decay | 0.05 |
79
+ | Seed | 3407 |
80
 
81
  ## Usage
82
 
 
86
  import torch
87
 
88
  base = "unsloth/Qwen2.5-7B-Instruct"
89
+ adapter = "UtsuSl0th/your-repo-name"
90
 
91
  tokenizer = AutoTokenizer.from_pretrained(base)
92
  model = AutoModelForCausalLM.from_pretrained(
93
  base,
94
+ torch_dtype=torch.bfloat16,
95
  device_map="auto",
96
  )
97
  model = PeftModel.from_pretrained(model, adapter)
 
99
 
100
  ## Sources & Terms (IMPORTANT)
101
 
102
+ Training data:
103
+ - `u-10bei/sft_alfworld_trajectory_dataset_v5`
104
+ - `u-10bei/dbbench_sft_dataset_react_v4`
105
 
106
+ Dataset License: MIT License. These datasets are used and distributed under the terms of the MIT License.
107
  Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.