Update README.md
Browse files
README.md
CHANGED
|
@@ -17,12 +17,12 @@ We performed **Reinforcement Learning (RL)** on the **InfiR2-7B-Instruct-FP8** m
|
|
| 17 |
|
| 18 |
| Parameter | Value |
|
| 19 |
| :---: | :---: |
|
| 20 |
-
| **Batch Size** | 128 |
|
| 21 |
| **N Samples Per Prompt** | 16 |
|
| 22 |
| **Global Batch Size** | 2048 |
|
| 23 |
| **Maximum Response Length** | 16384 |
|
| 24 |
| **Rollout Temperature** | 1.1 |
|
| 25 |
-
| **Learning Rate** | 1e-6 |
|
| 26 |
| **Weight Decay** | 0.1 |
|
| 27 |
| **Eps Clip** | 0.2 |
|
| 28 |
| **KL Loss Coefficient** | 0.00 |
|
|
@@ -40,7 +40,6 @@ The resulting model is the **InfiR2-R1-7B-FP8**.
|
|
| 40 |
- Stable and Reproducible Performance
|
| 41 |
- Efficient and Low memory Training
|
| 42 |
|
| 43 |
-
---
|
| 44 |
|
| 45 |
## π InfiR2 Model Series
|
| 46 |
|
|
@@ -54,7 +53,6 @@ The InfiR2 framework offers multiple variants model with different size and trai
|
|
| 54 |
- [InfiR2-7B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-Instruct-FP8): *Supervised fine-tuning on InfiR2-7B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
|
| 55 |
- [InfiR2-R1-7B-FP8](https://huggingface.co/InfiX-ai/InfiR2-R1-7B-FP8): *Reinforcement learning on InfiR2-7B-Instruct-FP8 with dapo dataset*
|
| 56 |
|
| 57 |
-
---
|
| 58 |
|
| 59 |
## π Model Performance
|
| 60 |
Below is the performance comparison of **InfiR2-R1-7B-FP8** on reasoning benchmarks. Note: 'w. InfiAlign' denotes Supervised Fine-Tuning (SFT) using the InfiAlign dataset.
|
|
@@ -99,7 +97,6 @@ Below is the performance comparison of **InfiR2-R1-7B-FP8** on reasoning benchma
|
|
| 99 |
|
| 100 |
</div>
|
| 101 |
|
| 102 |
-
---
|
| 103 |
|
| 104 |
## π Quick Start
|
| 105 |
|
|
@@ -149,7 +146,6 @@ print(f"(LLM Response): \n{llm_response}")
|
|
| 149 |
print("="*70)
|
| 150 |
````
|
| 151 |
|
| 152 |
-
-----
|
| 153 |
|
| 154 |
## π Model Download
|
| 155 |
|
|
@@ -160,7 +156,6 @@ mkdir -p ./models
|
|
| 160 |
huggingface-cli download --resume-download InfiX-ai/InfiR2-R1-7B-FP8 --local-dir ./models/InfiR2-R1-7B-FP8
|
| 161 |
```
|
| 162 |
|
| 163 |
-
-----
|
| 164 |
|
| 165 |
## π― Intended Uses
|
| 166 |
|
|
@@ -180,13 +175,11 @@ The model should **not** be used for:
|
|
| 180 |
- Generating harmful, offensive, or inappropriate content
|
| 181 |
- Creating misleading information
|
| 182 |
|
| 183 |
-
-----
|
| 184 |
|
| 185 |
## π Acknowledgements
|
| 186 |
|
| 187 |
* We would like to express our gratitude for the following open-source projects: [Slime](https://github.com/THUDM/slime), [Megatron](https://github.com/NVIDIA/Megatron-LM), [TransformerEngine](https://github.com/NVIDIA/TransformerEngine) and [Qwen2.5](https://github.com/QwenLM/Qwen2.5-Math).
|
| 188 |
|
| 189 |
-
-----
|
| 190 |
|
| 191 |
## π Citation
|
| 192 |
|
|
|
|
| 17 |
|
| 18 |
| Parameter | Value |
|
| 19 |
| :---: | :---: |
|
| 20 |
+
| **Batch Size (train\_prompt\_bsz)** | 128 |
|
| 21 |
| **N Samples Per Prompt** | 16 |
|
| 22 |
| **Global Batch Size** | 2048 |
|
| 23 |
| **Maximum Response Length** | 16384 |
|
| 24 |
| **Rollout Temperature** | 1.1 |
|
| 25 |
+
| **Learning Rate (LR)** | 1e-6 |
|
| 26 |
| **Weight Decay** | 0.1 |
|
| 27 |
| **Eps Clip** | 0.2 |
|
| 28 |
| **KL Loss Coefficient** | 0.00 |
|
|
|
|
| 40 |
- Stable and Reproducible Performance
|
| 41 |
- Efficient and Low memory Training
|
| 42 |
|
|
|
|
| 43 |
|
| 44 |
## π InfiR2 Model Series
|
| 45 |
|
|
|
|
| 53 |
- [InfiR2-7B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-Instruct-FP8): *Supervised fine-tuning on InfiR2-7B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
|
| 54 |
- [InfiR2-R1-7B-FP8](https://huggingface.co/InfiX-ai/InfiR2-R1-7B-FP8): *Reinforcement learning on InfiR2-7B-Instruct-FP8 with dapo dataset*
|
| 55 |
|
|
|
|
| 56 |
|
| 57 |
## π Model Performance
|
| 58 |
Below is the performance comparison of **InfiR2-R1-7B-FP8** on reasoning benchmarks. Note: 'w. InfiAlign' denotes Supervised Fine-Tuning (SFT) using the InfiAlign dataset.
|
|
|
|
| 97 |
|
| 98 |
</div>
|
| 99 |
|
|
|
|
| 100 |
|
| 101 |
## π Quick Start
|
| 102 |
|
|
|
|
| 146 |
print("="*70)
|
| 147 |
````
|
| 148 |
|
|
|
|
| 149 |
|
| 150 |
## π Model Download
|
| 151 |
|
|
|
|
| 156 |
huggingface-cli download --resume-download InfiX-ai/InfiR2-R1-7B-FP8 --local-dir ./models/InfiR2-R1-7B-FP8
|
| 157 |
```
|
| 158 |
|
|
|
|
| 159 |
|
| 160 |
## π― Intended Uses
|
| 161 |
|
|
|
|
| 175 |
- Generating harmful, offensive, or inappropriate content
|
| 176 |
- Creating misleading information
|
| 177 |
|
|
|
|
| 178 |
|
| 179 |
## π Acknowledgements
|
| 180 |
|
| 181 |
* We would like to express our gratitude for the following open-source projects: [Slime](https://github.com/THUDM/slime), [Megatron](https://github.com/NVIDIA/Megatron-LM), [TransformerEngine](https://github.com/NVIDIA/TransformerEngine) and [Qwen2.5](https://github.com/QwenLM/Qwen2.5-Math).
|
| 182 |
|
|
|
|
| 183 |
|
| 184 |
## π Citation
|
| 185 |
|