File size: 5,339 Bytes
fa4b212 7720292 b832317 fa4b212 7720292 fa4b212 7720292 fa4b212 7720292 b832317 7720292 880c977 b832317 02756d3 b832317 7720292 b832317 7720292 b832317 7720292 b832317 7720292 b832317 552f334 02756d3 552f334 b832317 fa4b212 b832317 fa4b212 b832317 fa4b212 7720292 fa4b212 b832317 fa4b212 b832317 fa4b212 b832317 fa4b212 b832317 fa4b212 b832317 fa4b212 b832317 fa4b212 b832317 fa4b212 b832317 fa4b212 b832317 7720292 fa4b212 b832317 fa4b212 b832317 fa4b212 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 |
---
license: apache-2.0
---
# InfiR2-R1-7B-FP8-Preview
<p align="center">
Β <a href="https://arxiv.org/abs/2509.22536">π Paper</a> |
<a href="https://github.com/InfiXAI/InfiR2">π Github</a> |
Β <a href="https://infix-ai.com/research/infir2/">π Project Website</a>
</p>
We performed multi-stage FP8 **Reinforcement Learning (RL)**. More experimental details will be released soon. Stay tuned!
## π InfiR2 Model Series
The InfiR2 framework offers multiple variants model with different size and training strategy:
- **1.5B**
- [InfiR2-1.5B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-base-FP8): *Continue pretrain on Qwen2.5-1.5B-base*
- [InfiR2-1.5B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-Instruct-FP8): *Supervised fine-tuning on InfiR2-1.5B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
- **7B**
- [InfiR2-7B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-base-FP8): *Continue pretrain on Qwen2.5-7B-base*
- [InfiR2-7B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-Instruct-FP8): *Supervised fine-tuning on InfiR2-7B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
- [InfiR2-R1-7B-FP8-Preview](https://huggingface.co/InfiX-ai/InfiR2-R1-7B-FP8-Preview): *Multi-stage FP8 Reinforcement Learning*
**Training Recipe**:
<p align="center">
<img src="fp8_recipe.png" width="100%"/>
<p>
- Stable and Reproducible Performance
- Efficient and Low memory Training
## π Hyperparameters & Model Performance
**Training hyperparameters**:
<div align="center">
| Parameter | Value |
| :---: | :---: |
| **Batch Size** | 128 |
| **N Samples Per Prompt** | 16 |
| **Global Batch Size** | 2048 |
| **Maximum Response Length** | 16384 |
| **Rollout Temperature** | 1.1 |
| **Learning Rate** | 1e-6 |
| **Weight Decay** | 0.1 |
| **Eps Clip** | 0.2 |
| **KL Loss Coefficient** | 0.00 |
</div>
Below is the performance comparison of **InfiR2-R1-7B-FP8-Preview** on reasoning benchmarks.
<div align="center">
<table>
<thead>
<tr>
<th align="left">Model</th>
<th align="center">AIME 25</th>
<th align="center">AIME 24</th>
<th align="center">GPQA</th>
<th align="center">LiveCodeBench v5</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left"><strong>Deepseek-Distill-Qwen-7B</strong></td>
<td align="center">43.00</td>
<td align="center">49.00</td>
<td align="center">48.20</td>
<td align="center">37.60</td>
</tr>
<tr>
<td align="left"><strong>InfiR2-R1-7B-FP8-Preview</strong></td>
<td align="center"><strong>53.64</strong></td>
<td align="center"><strong>60.62</strong></td>
<td align="center"><strong>49.18</strong></td>
<td align="center">39.36</td>
</tr>
</tr>
</tbody>
</table>
</div>
## π Quick Start
```python
from vllm import LLM, SamplingParams
import torch
import os
MODEL_NAME = "InfiX-ai/InfiR2-R1-7B-FP8-Preview"
prompt_text = "Briefly explain what a black hole is, and provide two interesting facts."
MAX_NEW_TOKENS = 256
TEMPERATURE = 0.8
DO_SAMPLE = True
llm = LLM(
model=MODEL_NAME,
dtype="auto",
)
sampling_params = SamplingParams(
n=1,
temperature=TEMPERATURE,
max_tokens=MAX_NEW_TOKENS,
)
tokenizer = llm.get_tokenizer()
messages = [
{"role": "user", "content": prompt_text}
]
prompt_formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = llm.generate(
prompt_formatted,
sampling_params
)
generated_text = outputs[0].outputs[0].text
llm_response = generated_text.strip()
print("\n" + "="*70)
print(f"Prompt: \n{prompt_text}")
print("-" * 70)
print(f"(LLM Response): \n{llm_response}")
print("="*70)
````
## π Model Download
```bash
# Create a directory for models
mkdir -p ./models
# Download InfiR2-R1-7B-FP8-Preview model
huggingface-cli download --resume-download InfiX-ai/InfiR2-R1-7B-FP8-Preview --local-dir ./models/InfiR2-R1-7B-FP8-Preview
```
## π― Intended Uses
### β
Direct Use
This model is intended for research and commercial use. Example use cases include:
- Instruction following
- Mathematical reasoning
- Code generation
- General reasoning
### β Out-of-Scope Use
The model should **not** be used for:
- Generating harmful, offensive, or inappropriate content
- Creating misleading information
## π Acknowledgements
* We would like to express our gratitude for the following open-source projects: [Slime](https://github.com/THUDM/slime), [Megatron](https://github.com/NVIDIA/Megatron-LM), [TransformerEngine](https://github.com/NVIDIA/TransformerEngine) and [Qwen2.5](https://github.com/QwenLM/Qwen2.5-Math).
## π Citation
If you find our work useful, please cite:
```bibtex
@misc{wang2025infir2comprehensivefp8training,
title={InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models},
author={Wenjun Wang and Shuo Cai and Congkai Xie and Mingfa Feng and Yiming Zhang and Zhen Li and Kejing Yang and Ming Li and Jiannong Cao and Hongxia Yang},
year={2025},
eprint={2509.22536},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={[https://arxiv.org/abs/2509.22536](https://arxiv.org/abs/2509.22536)},
}
``` |