Upload merged Qwen3-4B-Instruct-2507 model (auto-generated README)
Browse files- README.md +1 -8
- model-00001-of-00002.safetensors +1 -1
- model-00002-of-00002.safetensors +1 -1
README.md
CHANGED
|
@@ -32,20 +32,13 @@ Loss is applied to **all assistant turns** in the multi-turn trajectory,
|
|
| 32 |
enabling the model to learn environment observation, action selection,
|
| 33 |
tool use, and recovery from errors.
|
| 34 |
|
| 35 |
-
## Dataset Processing (Custom Filtering)
|
| 36 |
-
|
| 37 |
-
To improve the reasoning efficiency and reduce the risk of infinite loops (repetitive actions), the training dataset was customized with the following filtering strategy:
|
| 38 |
-
|
| 39 |
-
- **Optimization of Exploration**: Trajectories with **9 or more "detours"** were excluded from the training set.
|
| 40 |
-
- **Robustness Maintenance**: Trajectories with **0 to 8 detours** were retained.
|
| 41 |
-
|
| 42 |
## Training Configuration
|
| 43 |
|
| 44 |
- Base model: Qwen/Qwen3-4B-Instruct-2507
|
| 45 |
- Method: LoRA (full precision base)
|
| 46 |
- Max sequence length: 4096
|
| 47 |
- Epochs: 2
|
| 48 |
-
- Learning rate:
|
| 49 |
- LoRA: r=64, alpha=128
|
| 50 |
|
| 51 |
## Usage
|
|
|
|
| 32 |
enabling the model to learn environment observation, action selection,
|
| 33 |
tool use, and recovery from errors.
|
| 34 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
## Training Configuration
|
| 36 |
|
| 37 |
- Base model: Qwen/Qwen3-4B-Instruct-2507
|
| 38 |
- Method: LoRA (full precision base)
|
| 39 |
- Max sequence length: 4096
|
| 40 |
- Epochs: 2
|
| 41 |
+
- Learning rate: 2e-06
|
| 42 |
- LoRA: r=64, alpha=128
|
| 43 |
|
| 44 |
## Usage
|
model-00001-of-00002.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 4967215360
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8332ed7e005e09d2c4e419963d268b761152aa9074a2d72f3a3c4c6bd962f00d
|
| 3 |
size 4967215360
|
model-00002-of-00002.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 3077766632
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1093693d6dd2d6e9e26bba9980fde43544d67fb56a004dd63461abf6d3634ef1
|
| 3 |
size 3077766632
|