Safetensors
qwen2
fp8
baicaihaochi121 commited on
Commit
7720292
Β·
verified Β·
1 Parent(s): 552f334

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +33 -46
README.md CHANGED
@@ -2,7 +2,7 @@
2
  license: apache-2.0
3
  ---
4
 
5
- # InfiR2-R1-7B-FP8
6
 
7
  <p align="center">
8
  Β  <a href="https://arxiv.org/abs/2509.22536">πŸ“„ Paper</a> &nbsp; | &nbsp;
@@ -10,52 +10,53 @@ license: apache-2.0
10
  Β  <a href="https://infix-ai.com/research/infir2/">🌐 Project Website</a> &nbsp;
11
  </p>
12
 
13
- We performed **Reinforcement Learning (RL)** on the **InfiR2-7B-Instruct-FP8** model using the **dapo-math-17k** and the **FP8 format** (inference), with hyperparameters shown below.
14
-
15
- <div align="center">
16
 
 
17
 
18
- | Parameter(stage2) | Value |
19
- | :---: | :---: |
20
- | **Batch Size** | 128 |
21
- | **N Samples Per Prompt** | 16 |
22
- | **Global Batch Size** | 2048 |
23
- | **Maximum Response Length** | 16384 |
24
- | **Rollout Temperature** | 1.1 |
25
- | **Learning Rate** | 1e-6 |
26
- | **Weight Decay** | 0.1 |
27
- | **Eps Clip** | 0.2 |
28
- | **KL Loss Coefficient** | 0.00 |
29
 
30
- </div>
 
 
 
 
 
 
31
 
32
- The resulting model is the **InfiR2-R1-7B-FP8**.
33
 
34
 
35
  **Training Recipe**:
36
  <p align="center">
37
- <img src="fp8_recipe.png" width="100%"/>
38
  <p>
39
 
40
  - Stable and Reproducible Performance
41
  - Efficient and Low memory Training
42
 
43
 
44
- ## πŸš€ InfiR2 Model Series
45
 
46
- The InfiR2 framework offers multiple variants model with different size and training strategy:
47
 
48
- - **1.5B**
49
- - [InfiR2-1.5B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-base-FP8): *Continue pretrain on Qwen2.5-1.5B-base*
50
- - [InfiR2-1.5B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-Instruct-FP8): *Supervised fine-tuning on InfiR2-1.5B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
51
- - **7B**
52
- - [InfiR2-7B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-base-FP8): *Continue pretrain on Qwen2.5-7B-base*
53
- - [InfiR2-7B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-Instruct-FP8): *Supervised fine-tuning on InfiR2-7B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
54
- - [InfiR2-R1-7B-FP8](https://huggingface.co/InfiX-ai/InfiR2-R1-7B-FP8): *Reinforcement learning on InfiR2-7B-Instruct-FP8 with dapo dataset*
55
 
56
 
57
- ## πŸ“Š Model Performance
58
- Below is the performance comparison of **InfiR2-R1-7B-FP8** on reasoning benchmarks. Note: 'w. InfiAlign' denotes Supervised Fine-Tuning (SFT) using the InfiAlign dataset.
 
 
 
 
 
 
 
 
 
 
 
 
 
59
 
60
  <div align="center">
61
 
@@ -77,20 +78,6 @@ Below is the performance comparison of **InfiR2-R1-7B-FP8** on reasoning benchma
77
  <td align="center">48.20</td>
78
  <td align="center">37.60</td>
79
  </tr>
80
- <tr>
81
- <td align="left"><strong>Qwen2.5-7B-base (w. InfiAlign)</strong></td>
82
- <td align="center">33.75</td>
83
- <td align="center">43.02</td>
84
- <td align="center">48.11</td>
85
- <td align="center">39.48</td>
86
- </tr>
87
- <tr>
88
- <td align="left"><strong>InfiR2-7B-Instruct-FP8</strong></td>
89
- <td align="center">40.62</td>
90
- <td align="center">55.73</td>
91
- <td align="center">45.33</td>
92
- <td align="center">40.31</td>
93
- </tr>
94
  <tr>
95
  <td align="left"><strong>InfiR2-R1-7B-FP8</strong></td>
96
  <td align="center"><strong>53.64</strong></td>
@@ -112,7 +99,7 @@ from vllm import LLM, SamplingParams
112
  import torch
113
  import os
114
 
115
- MODEL_NAME = "InfiX-ai/InfiR2-R1-7B-FP8"
116
 
117
  prompt_text = "Briefly explain what a black hole is, and provide two interesting facts."
118
 
@@ -159,8 +146,8 @@ print("="*70)
159
  ```bash
160
  # Create a directory for models
161
  mkdir -p ./models
162
- # Download InfiR2-R1-7B-FP8 model
163
- huggingface-cli download --resume-download InfiX-ai/InfiR2-R1-7B-FP8 --local-dir ./models/InfiR2-R1-7B-FP8
164
  ```
165
 
166
 
 
2
  license: apache-2.0
3
  ---
4
 
5
+ # InfiR2-R1-7B-FP8-Preview
6
 
7
  <p align="center">
8
  Β  <a href="https://arxiv.org/abs/2509.22536">πŸ“„ Paper</a> &nbsp; | &nbsp;
 
10
  Β  <a href="https://infix-ai.com/research/infir2/">🌐 Project Website</a> &nbsp;
11
  </p>
12
 
13
+ We performed multi-stage FP8 **Reinforcement Learning (RL)**. More experimental details will be released soon. Stay tuned!
 
 
14
 
15
+ ## πŸš€ InfiR2 Model Series
16
 
17
+ The InfiR2 framework offers multiple variants model with different size and training strategy:
 
 
 
 
 
 
 
 
 
 
18
 
19
+ - **1.5B**
20
+ - [InfiR2-1.5B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-base-FP8): *Continue pretrain on Qwen2.5-1.5B-base*
21
+ - [InfiR2-1.5B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-Instruct-FP8): *Supervised fine-tuning on InfiR2-1.5B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
22
+ - **7B**
23
+ - [InfiR2-7B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-base-FP8): *Continue pretrain on Qwen2.5-7B-base*
24
+ - [InfiR2-7B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-Instruct-FP8): *Supervised fine-tuning on InfiR2-7B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
25
+ - [InfiR2-R1-7B-FP8-Preview](https://huggingface.co/InfiX-ai/InfiR2-R1-7B-FP8-Preview): *Multi-stage FP8 **Reinforcement Learning***
26
 
 
27
 
28
 
29
  **Training Recipe**:
30
  <p align="center">
31
+ <img src="fp8_recipe.png" width="60%"/>
32
  <p>
33
 
34
  - Stable and Reproducible Performance
35
  - Efficient and Low memory Training
36
 
37
 
38
+ ## πŸ“Š Hyperparameters & Model Performance
39
 
40
+ **Training hyperparameters**:
41
 
42
+ <div align="center">
 
 
 
 
 
 
43
 
44
 
45
+ | Parameter | Value |
46
+ | :---: | :---: |
47
+ | **Batch Size** | 128 |
48
+ | **N Samples Per Prompt** | 16 |
49
+ | **Global Batch Size** | 2048 |
50
+ | **Maximum Response Length** | 16384 |
51
+ | **Rollout Temperature** | 1.1 |
52
+ | **Learning Rate** | 1e-6 |
53
+ | **Weight Decay** | 0.1 |
54
+ | **Eps Clip** | 0.2 |
55
+ | **KL Loss Coefficient** | 0.00 |
56
+
57
+ </div>
58
+
59
+ Below is the performance comparison of **InfiR2-R1-7B-FP8-Preview** on reasoning benchmarks.
60
 
61
  <div align="center">
62
 
 
78
  <td align="center">48.20</td>
79
  <td align="center">37.60</td>
80
  </tr>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
81
  <tr>
82
  <td align="left"><strong>InfiR2-R1-7B-FP8</strong></td>
83
  <td align="center"><strong>53.64</strong></td>
 
99
  import torch
100
  import os
101
 
102
+ MODEL_NAME = "InfiX-ai/InfiR2-R1-7B-FP8-Preview"
103
 
104
  prompt_text = "Briefly explain what a black hole is, and provide two interesting facts."
105
 
 
146
  ```bash
147
  # Create a directory for models
148
  mkdir -p ./models
149
+ # Download InfiR2-R1-7B-FP8-Preview model
150
+ huggingface-cli download --resume-download InfiX-ai/InfiR2-R1-7B-FP8-Preview --local-dir ./models/InfiR2-R1-7B-FP8-Preview
151
  ```
152
 
153