File size: 5,339 Bytes

---
license: apache-2.0
---

# InfiR2-R1-7B-FP8-Preview

<p align="center">
  <a href="https://arxiv.org/abs/2509.22536">📄 Paper</a> &nbsp; | &nbsp;
  <a href="https://github.com/InfiXAI/InfiR2">🐙 Github</a> &nbsp; |
  <a href="https://infix-ai.com/research/infir2/">🌐 Project Website</a> &nbsp;
</p>

We performed multi-stage FP8 **Reinforcement Learning (RL)**. More experimental details will be released soon. Stay tuned!

## 🚀 InfiR2 Model Series

The InfiR2 framework offers multiple variants model with different size and training strategy:

- **1.5B**
- [InfiR2-1.5B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-base-FP8): *Continue pretrain on Qwen2.5-1.5B-base*
- [InfiR2-1.5B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-Instruct-FP8): *Supervised fine-tuning on InfiR2-1.5B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
- **7B**
- [InfiR2-7B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-base-FP8): *Continue pretrain on Qwen2.5-7B-base*
- [InfiR2-7B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-Instruct-FP8): *Supervised fine-tuning on InfiR2-7B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
- [InfiR2-R1-7B-FP8-Preview](https://huggingface.co/InfiX-ai/InfiR2-R1-7B-FP8-Preview): *Multi-stage FP8 Reinforcement Learning*



**Training Recipe**:
<p align="center">
    <img src="fp8_recipe.png" width="100%"/>
<p>

- Stable and Reproducible Performance
- Efficient and Low memory Training 


## 📊 Hyperparameters & Model Performance

**Training hyperparameters**:

<div align="center">


| Parameter | Value |
| :---: | :---: |
| **Batch Size** | 128 |
| **N Samples Per Prompt** | 16 |
| **Global Batch Size** | 2048 |
| **Maximum Response Length** | 16384 |
| **Rollout Temperature** | 1.1 |
| **Learning Rate** | 1e-6 |
| **Weight Decay** | 0.1 |
| **Eps Clip** | 0.2 |
| **KL Loss Coefficient** | 0.00 |

</div>

Below is the performance comparison of **InfiR2-R1-7B-FP8-Preview** on reasoning benchmarks.

<div align="center">

<table>
  <thead>
    <tr>
      <th align="left">Model</th>
      <th align="center">AIME 25</th>
      <th align="center">AIME 24</th>
      <th align="center">GPQA</th>
      <th align="center">LiveCodeBench v5</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td align="left"><strong>Deepseek-Distill-Qwen-7B</strong></td>
      <td align="center">43.00</td>
      <td align="center">49.00</td>
      <td align="center">48.20</td>
      <td align="center">37.60</td>
    </tr>
    <tr>
      <td align="left"><strong>InfiR2-R1-7B-FP8-Preview</strong></td>
      <td align="center"><strong>53.64</strong></td>
      <td align="center"><strong>60.62</strong></td>
      <td align="center"><strong>49.18</strong></td>
      <td align="center">39.36</td>
    </tr>
    </tr>
  </tbody>
</table>

</div>


## 🎭 Quick Start

```python
from vllm import LLM, SamplingParams
import torch
import os

MODEL_NAME = "InfiX-ai/InfiR2-R1-7B-FP8-Preview"

prompt_text = "Briefly explain what a black hole is, and provide two interesting facts."

MAX_NEW_TOKENS = 256
TEMPERATURE = 0.8
DO_SAMPLE = True

llm = LLM(
    model=MODEL_NAME, 
    dtype="auto", 
)

sampling_params = SamplingParams(
    n=1,
    temperature=TEMPERATURE,
    max_tokens=MAX_NEW_TOKENS,
)

tokenizer = llm.get_tokenizer()
messages = [
    {"role": "user", "content": prompt_text}
]
prompt_formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

outputs = llm.generate(
    prompt_formatted, 
    sampling_params
)

generated_text = outputs[0].outputs[0].text

llm_response = generated_text.strip()

print("\n" + "="*70)
print(f"Prompt: \n{prompt_text}")
print("-" * 70)
print(f"(LLM Response): \n{llm_response}")
print("="*70)
````


## 📚 Model Download

```bash
# Create a directory for models
mkdir -p ./models
# Download InfiR2-R1-7B-FP8-Preview model
huggingface-cli download --resume-download InfiX-ai/InfiR2-R1-7B-FP8-Preview --local-dir ./models/InfiR2-R1-7B-FP8-Preview
```


## 🎯 Intended Uses

### ✅ Direct Use

This model is intended for research and commercial use. Example use cases include:

  - Instruction following
  - Mathematical reasoning
  - Code generation
  - General reasoning

### ❌ Out-of-Scope Use

The model should **not** be used for:

  - Generating harmful, offensive, or inappropriate content
  - Creating misleading information


## 🙏 Acknowledgements

  * We would like to express our gratitude for the following open-source projects: [Slime](https://github.com/THUDM/slime), [Megatron](https://github.com/NVIDIA/Megatron-LM), [TransformerEngine](https://github.com/NVIDIA/TransformerEngine) and [Qwen2.5](https://github.com/QwenLM/Qwen2.5-Math).


## 📌 Citation

If you find our work useful, please cite:

```bibtex
@misc{wang2025infir2comprehensivefp8training,
      title={InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models}, 
      author={Wenjun Wang and Shuo Cai and Congkai Xie and Mingfa Feng and Yiming Zhang and Zhen Li and Kejing Yang and Ming Li and Jiannong Cao and Hongxia Yang},
      year={2025},
      eprint={2509.22536},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={[https://arxiv.org/abs/2509.22536](https://arxiv.org/abs/2509.22536)}, 
}
```