Upload merged Qwen3-4B-Instruct-2507 model (auto-generated README)

Files changed (3) hide show

README.md CHANGED Viewed

@@ -32,20 +32,13 @@ Loss is applied to **all assistant turns** in the multi-turn trajectory,
 enabling the model to learn environment observation, action selection,
 tool use, and recovery from errors.
-## Dataset Processing (Custom Filtering)
-To improve the reasoning efficiency and reduce the risk of infinite loops (repetitive actions), the training dataset was customized with the following filtering strategy:
-- **Optimization of Exploration**: Trajectories with **9 or more "detours"** were excluded from the training set.
-- **Robustness Maintenance**: Trajectories with **0 to 8 detours** were retained.
 ## Training Configuration
 - Base model: Qwen/Qwen3-4B-Instruct-2507
 - Method: LoRA (full precision base)
 - Max sequence length: 4096
 - Epochs: 2
-- Learning rate: 1e-06
 - LoRA: r=64, alpha=128
 ## Usage

 enabling the model to learn environment observation, action selection,
 tool use, and recovery from errors.
 ## Training Configuration
 - Base model: Qwen/Qwen3-4B-Instruct-2507
 - Method: LoRA (full precision base)
 - Max sequence length: 4096
 - Epochs: 2
+- Learning rate: 2e-06
 - LoRA: r=64, alpha=128
 ## Usage

model-00001-of-00002.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:43448e0ceca01b8d31c4500c1ece44ffeadf5714708186ca6de2cdb938d214bf
 size 4967215360

 version https://git-lfs.github.com/spec/v1
+oid sha256:8332ed7e005e09d2c4e419963d268b761152aa9074a2d72f3a3c4c6bd962f00d
 size 4967215360

model-00002-of-00002.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:e8c875fd45620ba353b9d7e152b00094f4c7143a72b5dce92c937434353265d0
 size 3077766632

 version https://git-lfs.github.com/spec/v1
+oid sha256:1093693d6dd2d6e9e26bba9980fde43544d67fb56a004dd63461abf6d3634ef1
 size 3077766632