File size: 5,513 Bytes
7181a20 12bde89 b0f4ed1 12bde89 4e4e430 12bde89 32fa036 12bde89 4e4e430 12bde89 e220d05 12bde89 e07df39 12bde89 9055bc2 12bde89 7181a20 12bde89 7181a20 12bde89 7181a20 12bde89 7181a20 12bde89 7181a20 12bde89 7181a20 12bde89 7181a20 12bde89 7181a20 12bde89 7181a20 12bde89 7181a20 12bde89 7181a20 12bde89 7181a20 12bde89 7181a20 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 | ---
license: apache-2.0
---
# InfiR2-7B-Instruct-FP8
<p align="center">
Β <a href="https://arxiv.org/abs/2509.22536">π Paper</a> |
<a href="https://github.com/InfiXAI/InfiR2">π Github</a> |
Β <a href="https://infix-ai.com/research/infir2/">π Project Website</a>
</p>
We performed supervised fine-tuning on the **InfiR2-7B-base-FP8** with FP8 format in two stages using the InfiAlign-SFT-72k and InfiAlign-SFT-165k datasets.
**Training Recipe**:
<p align="center">
<img src="fp8_recipe.png" width="100%"/>
<p>
- Stable and Reproducible Performance
- Efficient and Low memory Training
**Hyperparameters**:
<div align="center">
| Parameter | Value |
| :---: | :---: |
| **Batch Size** | 64 |
| **Learning Rate** | 1e-5 |
| **Minimum Learning Rate** | 1e-6 |
| **Weight Decay** | 0.05 |
| **Context Length** | 32k |
</div>
The resulting model is the **InfiR2-7B-Instruct-FP8**.
## π InfiR2 Model Series
The InfiR2 framework offers multiple variants model with different size and training strategy:
- **1.5B**
- [InfiR2-1.5B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-base-FP8): *Continue pretrain on Qwen2.5-1.5B-base*
- [InfiR2-1.5B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-Instruct-FP8): *Supervised fine-tuning on InfiR2-1.5B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
- **7B**
- [InfiR2-7B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-base-FP8): *Continue pretrain on Qwen2.5-7B-base*
- [InfiR2-7B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-Instruct-FP8): *Supervised fine-tuning on InfiR2-7B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
- [InfiR2-R1-7B-FP8-Preview](https://huggingface.co/InfiX-ai/InfiR2-R1-7B-FP8-Preview): *Multi-stage FP8 Reinforcement Learning*
## π Model Performance
Below is the performance comparison of InfiR2-7B-Instruct-FP8 on reasoning benchmarks. Note: 'w. InfiAlign' denotes Supervised Fine-Tuning (SFT) using the InfiAlign dataset.
</div>
<div align="center">
<table>
<thead>
<tr>
<th align="left">Model</th>
<th align="center">AIME 25</th>
<th align="center">AIME 24</th>
<th align="center">GPQA</th>
<th align="center">LiveCodeBench v5</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left"><strong>Deepseek-Distill-Qwen-7B</strong></td>
<td align="center">43.00</td>
<td align="center">49.00</td>
<td align="center">48.20</td>
<td align="center">37.60</td>
</tr>
<tr>
<td align="left"><strong>Qwen2.5-7B-base (w. InfiAlign)</strong></td>
<td align="center">33.75</td>
<td align="center">43.02</td>
<td align="center">48.11</td>
<td align="center">39.48</td>
</tr>
<tr>
<td align="left"><strong>InfiR2-7B-Instruct-FP8</strong></td>
<td align="center">40.62</td>
<td align="center">55.73</td>
<td align="center">45.33</td>
<td align="center">40.31</td>
</tr>
</tr>
</tbody>
</table>
</div>
## π Quick Start
```python
from vllm import LLM, SamplingParams
import torch
import os
MODEL_NAME = "InfiX-ai/InfiR2-7B-Instruct-FP8"
prompt_text = "Briefly explain what a black hole is, and provide two interesting facts."
MAX_NEW_TOKENS = 256
TEMPERATURE = 0.8
DO_SAMPLE = True
llm = LLM(
model=MODEL_NAME,
dtype="auto",
)
sampling_params = SamplingParams(
n=1,
temperature=TEMPERATURE,
max_tokens=MAX_NEW_TOKENS,
)
tokenizer = llm.get_tokenizer()
messages = [
{"role": "user", "content": prompt_text}
]
prompt_formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = llm.generate(
prompt_formatted,
sampling_params
)
generated_text = outputs[0].outputs[0].text
llm_response = generated_text.strip()
print("\n" + "="*70)
print(f"Prompt: \n{prompt_text}")
print("-" * 70)
print(f"(LLM Response): \n{llm_response}")
print("="*70)
```
## π Model Download
```bash
# Create a directory for models
mkdir -p ./models
# Download InfiR2-7B-Instruct-FP8 model
huggingface-cli download --resume-download InfiX-ai/InfiR2-7B-Instruct-FP8 --local-dir ./models/InfiR2-7B-Instruct-FP8
```
## π― Intended Uses
### β
Direct Use
This model is intended for research and commercial use. Example use cases include:
- Instruction following
- Mathematical reasoning
- Code generation
- General reasoning
### β Out-of-Scope Use
The model should **not** be used for:
- Generating harmful, offensive, or inappropriate content
- Creating misleading information
## π Acknowledgements
* We would like to express our gratitude for the following open-source projects: [Slime](https://github.com/THUDM/slime), [Megatron](https://github.com/NVIDIA/Megatron-LM), [TransformerEngine](https://github.com/NVIDIA/TransformerEngine) and [Qwen2.5](https://github.com/QwenLM/Qwen2.5-Math).
## π Citation
If you find our work useful, please cite:
```bibtex
@misc{wang2025infir2comprehensivefp8training,
title={InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models},
author={Wenjun Wang and Shuo Cai and Congkai Xie and Mingfa Feng and Yiming Zhang and Zhen Li and Kejing Yang and Ming Li and Jiannong Cao and Hongxia Yang},
year={2025},
eprint={2509.22536},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={[https://arxiv.org/abs/2509.22536](https://arxiv.org/abs/2509.22536)},
}
``` |