Safetensors
qwen2
fp8
juezhi commited on
Commit
5f3cb04
Β·
verified Β·
1 Parent(s): 1d161f9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -9
README.md CHANGED
@@ -17,12 +17,12 @@ We performed **Reinforcement Learning (RL)** on the **InfiR2-7B-Instruct-FP8** m
17
 
18
  | Parameter | Value |
19
  | :---: | :---: |
20
- | **Batch Size** | 128 |
21
  | **N Samples Per Prompt** | 16 |
22
  | **Global Batch Size** | 2048 |
23
  | **Maximum Response Length** | 16384 |
24
  | **Rollout Temperature** | 1.1 |
25
- | **Learning Rate** | 1e-6 |
26
  | **Weight Decay** | 0.1 |
27
  | **Eps Clip** | 0.2 |
28
  | **KL Loss Coefficient** | 0.00 |
@@ -40,7 +40,6 @@ The resulting model is the **InfiR2-R1-7B-FP8**.
40
  - Stable and Reproducible Performance
41
  - Efficient and Low memory Training
42
 
43
- ---
44
 
45
  ## πŸš€ InfiR2 Model Series
46
 
@@ -54,7 +53,6 @@ The InfiR2 framework offers multiple variants model with different size and trai
54
  - [InfiR2-7B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-Instruct-FP8): *Supervised fine-tuning on InfiR2-7B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
55
  - [InfiR2-R1-7B-FP8](https://huggingface.co/InfiX-ai/InfiR2-R1-7B-FP8): *Reinforcement learning on InfiR2-7B-Instruct-FP8 with dapo dataset*
56
 
57
- ---
58
 
59
  ## πŸ“Š Model Performance
60
  Below is the performance comparison of **InfiR2-R1-7B-FP8** on reasoning benchmarks. Note: 'w. InfiAlign' denotes Supervised Fine-Tuning (SFT) using the InfiAlign dataset.
@@ -99,7 +97,6 @@ Below is the performance comparison of **InfiR2-R1-7B-FP8** on reasoning benchma
99
 
100
  </div>
101
 
102
- ---
103
 
104
  ## 🎭 Quick Start
105
 
@@ -149,7 +146,6 @@ print(f"(LLM Response): \n{llm_response}")
149
  print("="*70)
150
  ````
151
 
152
- -----
153
 
154
  ## πŸ“š Model Download
155
 
@@ -160,7 +156,6 @@ mkdir -p ./models
160
  huggingface-cli download --resume-download InfiX-ai/InfiR2-R1-7B-FP8 --local-dir ./models/InfiR2-R1-7B-FP8
161
  ```
162
 
163
- -----
164
 
165
  ## 🎯 Intended Uses
166
 
@@ -180,13 +175,11 @@ The model should **not** be used for:
180
  - Generating harmful, offensive, or inappropriate content
181
  - Creating misleading information
182
 
183
- -----
184
 
185
  ## πŸ™ Acknowledgements
186
 
187
  * We would like to express our gratitude for the following open-source projects: [Slime](https://github.com/THUDM/slime), [Megatron](https://github.com/NVIDIA/Megatron-LM), [TransformerEngine](https://github.com/NVIDIA/TransformerEngine) and [Qwen2.5](https://github.com/QwenLM/Qwen2.5-Math).
188
 
189
- -----
190
 
191
  ## πŸ“Œ Citation
192
 
 
17
 
18
  | Parameter | Value |
19
  | :---: | :---: |
20
+ | **Batch Size (train\_prompt\_bsz)** | 128 |
21
  | **N Samples Per Prompt** | 16 |
22
  | **Global Batch Size** | 2048 |
23
  | **Maximum Response Length** | 16384 |
24
  | **Rollout Temperature** | 1.1 |
25
+ | **Learning Rate (LR)** | 1e-6 |
26
  | **Weight Decay** | 0.1 |
27
  | **Eps Clip** | 0.2 |
28
  | **KL Loss Coefficient** | 0.00 |
 
40
  - Stable and Reproducible Performance
41
  - Efficient and Low memory Training
42
 
 
43
 
44
  ## πŸš€ InfiR2 Model Series
45
 
 
53
  - [InfiR2-7B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-Instruct-FP8): *Supervised fine-tuning on InfiR2-7B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
54
  - [InfiR2-R1-7B-FP8](https://huggingface.co/InfiX-ai/InfiR2-R1-7B-FP8): *Reinforcement learning on InfiR2-7B-Instruct-FP8 with dapo dataset*
55
 
 
56
 
57
  ## πŸ“Š Model Performance
58
  Below is the performance comparison of **InfiR2-R1-7B-FP8** on reasoning benchmarks. Note: 'w. InfiAlign' denotes Supervised Fine-Tuning (SFT) using the InfiAlign dataset.
 
97
 
98
  </div>
99
 
 
100
 
101
  ## 🎭 Quick Start
102
 
 
146
  print("="*70)
147
  ````
148
 
 
149
 
150
  ## πŸ“š Model Download
151
 
 
156
  huggingface-cli download --resume-download InfiX-ai/InfiR2-R1-7B-FP8 --local-dir ./models/InfiR2-R1-7B-FP8
157
  ```
158
 
 
159
 
160
  ## 🎯 Intended Uses
161
 
 
175
  - Generating harmful, offensive, or inappropriate content
176
  - Creating misleading information
177
 
 
178
 
179
  ## πŸ™ Acknowledgements
180
 
181
  * We would like to express our gratitude for the following open-source projects: [Slime](https://github.com/THUDM/slime), [Megatron](https://github.com/NVIDIA/Megatron-LM), [TransformerEngine](https://github.com/NVIDIA/TransformerEngine) and [Qwen2.5](https://github.com/QwenLM/Qwen2.5-Math).
182
 
 
183
 
184
  ## πŸ“Œ Citation
185