naru0411 commited on
Commit
2136354
·
verified ·
1 Parent(s): 7dec6d6

Upload merged Qwen3-4B-Instruct-2507 model (auto-generated README)

Browse files
README.md CHANGED
@@ -32,20 +32,13 @@ Loss is applied to **all assistant turns** in the multi-turn trajectory,
32
  enabling the model to learn environment observation, action selection,
33
  tool use, and recovery from errors.
34
 
35
- ## Dataset Processing (Custom Filtering)
36
-
37
- To improve the reasoning efficiency and reduce the risk of infinite loops (repetitive actions), the training dataset was customized with the following filtering strategy:
38
-
39
- - **Optimization of Exploration**: Trajectories with **9 or more "detours"** were excluded from the training set.
40
- - **Robustness Maintenance**: Trajectories with **0 to 8 detours** were retained.
41
-
42
  ## Training Configuration
43
 
44
  - Base model: Qwen/Qwen3-4B-Instruct-2507
45
  - Method: LoRA (full precision base)
46
  - Max sequence length: 4096
47
  - Epochs: 2
48
- - Learning rate: 1e-06
49
  - LoRA: r=64, alpha=128
50
 
51
  ## Usage
 
32
  enabling the model to learn environment observation, action selection,
33
  tool use, and recovery from errors.
34
 
 
 
 
 
 
 
 
35
  ## Training Configuration
36
 
37
  - Base model: Qwen/Qwen3-4B-Instruct-2507
38
  - Method: LoRA (full precision base)
39
  - Max sequence length: 4096
40
  - Epochs: 2
41
+ - Learning rate: 2e-06
42
  - LoRA: r=64, alpha=128
43
 
44
  ## Usage
model-00001-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:43448e0ceca01b8d31c4500c1ece44ffeadf5714708186ca6de2cdb938d214bf
3
  size 4967215360
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8332ed7e005e09d2c4e419963d268b761152aa9074a2d72f3a3c4c6bd962f00d
3
  size 4967215360
model-00002-of-00002.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:e8c875fd45620ba353b9d7e152b00094f4c7143a72b5dce92c937434353265d0
3
  size 3077766632
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1093693d6dd2d6e9e26bba9980fde43544d67fb56a004dd63461abf6d3634ef1
3
  size 3077766632