Update README.md
Browse files
README.md
CHANGED
|
@@ -17,12 +17,12 @@ We performed **Reinforcement Learning (RL)** on the **InfiR2-7B-Instruct-FP8** m
|
|
| 17 |
|
| 18 |
| Parameter | Value |
|
| 19 |
| :---: | :---: |
|
| 20 |
-
| **Batch Size
|
| 21 |
| **N Samples Per Prompt** | 16 |
|
| 22 |
| **Global Batch Size** | 2048 |
|
| 23 |
| **Maximum Response Length** | 16384 |
|
| 24 |
| **Rollout Temperature** | 1.1 |
|
| 25 |
-
| **Learning Rate
|
| 26 |
| **Weight Decay** | 0.1 |
|
| 27 |
| **Eps Clip** | 0.2 |
|
| 28 |
| **KL Loss Coefficient** | 0.00 |
|
|
|
|
| 17 |
|
| 18 |
| Parameter | Value |
|
| 19 |
| :---: | :---: |
|
| 20 |
+
| **Batch Size** | 128 |
|
| 21 |
| **N Samples Per Prompt** | 16 |
|
| 22 |
| **Global Batch Size** | 2048 |
|
| 23 |
| **Maximum Response Length** | 16384 |
|
| 24 |
| **Rollout Temperature** | 1.1 |
|
| 25 |
+
| **Learning Rate** | 1e-6 |
|
| 26 |
| **Weight Decay** | 0.1 |
|
| 27 |
| **Eps Clip** | 0.2 |
|
| 28 |
| **KL Loss Coefficient** | 0.00 |
|