Safetensors
qwen2
fp8
juezhi's picture
Update README.md
32fa036 verified
---
license: apache-2.0
---
# InfiR2-7B-Instruct-FP8
<p align="center">
Β  <a href="https://arxiv.org/abs/2509.22536">πŸ“„ Paper</a> &nbsp; | &nbsp;
<a href="https://github.com/InfiXAI/InfiR2">πŸ™ Github</a> &nbsp; |
  <a href="https://infix-ai.com/research/infir2/">🌐 Project Website</a> &nbsp;
</p>
We performed supervised fine-tuning on the **InfiR2-7B-base-FP8** with FP8 format in two stages using the InfiAlign-SFT-72k and InfiAlign-SFT-165k datasets.
**Training Recipe**:
<p align="center">
<img src="fp8_recipe.png" width="100%"/>
<p>
- Stable and Reproducible Performance
- Efficient and Low memory Training
**Hyperparameters**:
<div align="center">
| Parameter | Value |
| :---: | :---: |
| **Batch Size** | 64 |
| **Learning Rate** | 1e-5 |
| **Minimum Learning Rate** | 1e-6 |
| **Weight Decay** | 0.05 |
| **Context Length** | 32k |
</div>
The resulting model is the **InfiR2-7B-Instruct-FP8**.
## πŸš€ InfiR2 Model Series
The InfiR2 framework offers multiple variants model with different size and training strategy:
- **1.5B**
- [InfiR2-1.5B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-base-FP8): *Continue pretrain on Qwen2.5-1.5B-base*
- [InfiR2-1.5B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-Instruct-FP8): *Supervised fine-tuning on InfiR2-1.5B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
- **7B**
- [InfiR2-7B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-base-FP8): *Continue pretrain on Qwen2.5-7B-base*
- [InfiR2-7B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-Instruct-FP8): *Supervised fine-tuning on InfiR2-7B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
- [InfiR2-R1-7B-FP8-Preview](https://huggingface.co/InfiX-ai/InfiR2-R1-7B-FP8-Preview): *Multi-stage FP8 Reinforcement Learning*
## πŸ“Š Model Performance
Below is the performance comparison of InfiR2-7B-Instruct-FP8 on reasoning benchmarks. Note: 'w. InfiAlign' denotes Supervised Fine-Tuning (SFT) using the InfiAlign dataset.
</div>
<div align="center">
<table>
<thead>
<tr>
<th align="left">Model</th>
<th align="center">AIME 25</th>
<th align="center">AIME 24</th>
<th align="center">GPQA</th>
<th align="center">LiveCodeBench v5</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left"><strong>Deepseek-Distill-Qwen-7B</strong></td>
<td align="center">43.00</td>
<td align="center">49.00</td>
<td align="center">48.20</td>
<td align="center">37.60</td>
</tr>
<tr>
<td align="left"><strong>Qwen2.5-7B-base (w. InfiAlign)</strong></td>
<td align="center">33.75</td>
<td align="center">43.02</td>
<td align="center">48.11</td>
<td align="center">39.48</td>
</tr>
<tr>
<td align="left"><strong>InfiR2-7B-Instruct-FP8</strong></td>
<td align="center">40.62</td>
<td align="center">55.73</td>
<td align="center">45.33</td>
<td align="center">40.31</td>
</tr>
</tr>
</tbody>
</table>
</div>
## 🎭 Quick Start
```python
from vllm import LLM, SamplingParams
import torch
import os
MODEL_NAME = "InfiX-ai/InfiR2-7B-Instruct-FP8"
prompt_text = "Briefly explain what a black hole is, and provide two interesting facts."
MAX_NEW_TOKENS = 256
TEMPERATURE = 0.8
DO_SAMPLE = True
llm = LLM(
model=MODEL_NAME,
dtype="auto",
)
sampling_params = SamplingParams(
n=1,
temperature=TEMPERATURE,
max_tokens=MAX_NEW_TOKENS,
)
tokenizer = llm.get_tokenizer()
messages = [
{"role": "user", "content": prompt_text}
]
prompt_formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = llm.generate(
prompt_formatted,
sampling_params
)
generated_text = outputs[0].outputs[0].text
llm_response = generated_text.strip()
print("\n" + "="*70)
print(f"Prompt: \n{prompt_text}")
print("-" * 70)
print(f"(LLM Response): \n{llm_response}")
print("="*70)
```
## πŸ“š Model Download
```bash
# Create a directory for models
mkdir -p ./models
# Download InfiR2-7B-Instruct-FP8 model
huggingface-cli download --resume-download InfiX-ai/InfiR2-7B-Instruct-FP8 --local-dir ./models/InfiR2-7B-Instruct-FP8
```
## 🎯 Intended Uses
### βœ… Direct Use
This model is intended for research and commercial use. Example use cases include:
- Instruction following
- Mathematical reasoning
- Code generation
- General reasoning
### ❌ Out-of-Scope Use
The model should **not** be used for:
- Generating harmful, offensive, or inappropriate content
- Creating misleading information
## πŸ™ Acknowledgements
* We would like to express our gratitude for the following open-source projects: [Slime](https://github.com/THUDM/slime), [Megatron](https://github.com/NVIDIA/Megatron-LM), [TransformerEngine](https://github.com/NVIDIA/TransformerEngine) and [Qwen2.5](https://github.com/QwenLM/Qwen2.5-Math).
## πŸ“Œ Citation
If you find our work useful, please cite:
```bibtex
@misc{wang2025infir2comprehensivefp8training,
title={InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models},
author={Wenjun Wang and Shuo Cai and Congkai Xie and Mingfa Feng and Yiming Zhang and Zhen Li and Kejing Yang and Ming Li and Jiannong Cao and Hongxia Yang},
year={2025},
eprint={2509.22536},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={[https://arxiv.org/abs/2509.22536](https://arxiv.org/abs/2509.22536)},
}
```