|
|
--- |
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
# InfiR2-R1-7B-FP8-Preview |
|
|
|
|
|
<p align="center"> |
|
|
Β <a href="https://arxiv.org/abs/2509.22536">π Paper</a> | |
|
|
<a href="https://github.com/InfiXAI/InfiR2">π Github</a> | |
|
|
Β <a href="https://infix-ai.com/research/infir2/">π Project Website</a> |
|
|
</p> |
|
|
|
|
|
We performed multi-stage FP8 **Reinforcement Learning (RL)**. More experimental details will be released soon. Stay tuned! |
|
|
|
|
|
## π InfiR2 Model Series |
|
|
|
|
|
The InfiR2 framework offers multiple variants model with different size and training strategy: |
|
|
|
|
|
- **1.5B** |
|
|
- [InfiR2-1.5B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-base-FP8): *Continue pretrain on Qwen2.5-1.5B-base* |
|
|
- [InfiR2-1.5B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-Instruct-FP8): *Supervised fine-tuning on InfiR2-1.5B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)* |
|
|
- **7B** |
|
|
- [InfiR2-7B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-base-FP8): *Continue pretrain on Qwen2.5-7B-base* |
|
|
- [InfiR2-7B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-Instruct-FP8): *Supervised fine-tuning on InfiR2-7B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)* |
|
|
- [InfiR2-R1-7B-FP8-Preview](https://huggingface.co/InfiX-ai/InfiR2-R1-7B-FP8-Preview): *Multi-stage FP8 Reinforcement Learning* |
|
|
|
|
|
|
|
|
|
|
|
**Training Recipe**: |
|
|
<p align="center"> |
|
|
<img src="fp8_recipe.png" width="100%"/> |
|
|
<p> |
|
|
|
|
|
- Stable and Reproducible Performance |
|
|
- Efficient and Low memory Training |
|
|
|
|
|
|
|
|
## π Hyperparameters & Model Performance |
|
|
|
|
|
**Training hyperparameters**: |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
|
|
|
| Parameter | Value | |
|
|
| :---: | :---: | |
|
|
| **Batch Size** | 128 | |
|
|
| **N Samples Per Prompt** | 16 | |
|
|
| **Global Batch Size** | 2048 | |
|
|
| **Maximum Response Length** | 16384 | |
|
|
| **Rollout Temperature** | 1.1 | |
|
|
| **Learning Rate** | 1e-6 | |
|
|
| **Weight Decay** | 0.1 | |
|
|
| **Eps Clip** | 0.2 | |
|
|
| **KL Loss Coefficient** | 0.00 | |
|
|
|
|
|
</div> |
|
|
|
|
|
Below is the performance comparison of **InfiR2-R1-7B-FP8-Preview** on reasoning benchmarks. |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
<table> |
|
|
<thead> |
|
|
<tr> |
|
|
<th align="left">Model</th> |
|
|
<th align="center">AIME 25</th> |
|
|
<th align="center">AIME 24</th> |
|
|
<th align="center">GPQA</th> |
|
|
<th align="center">LiveCodeBench v5</th> |
|
|
</tr> |
|
|
</thead> |
|
|
<tbody> |
|
|
<tr> |
|
|
<td align="left"><strong>Deepseek-Distill-Qwen-7B</strong></td> |
|
|
<td align="center">43.00</td> |
|
|
<td align="center">49.00</td> |
|
|
<td align="center">48.20</td> |
|
|
<td align="center">37.60</td> |
|
|
</tr> |
|
|
<tr> |
|
|
<td align="left"><strong>InfiR2-R1-7B-FP8-Preview</strong></td> |
|
|
<td align="center"><strong>53.64</strong></td> |
|
|
<td align="center"><strong>60.62</strong></td> |
|
|
<td align="center"><strong>49.18</strong></td> |
|
|
<td align="center">39.36</td> |
|
|
</tr> |
|
|
</tr> |
|
|
</tbody> |
|
|
</table> |
|
|
|
|
|
</div> |
|
|
|
|
|
|
|
|
## π Quick Start |
|
|
|
|
|
```python |
|
|
from vllm import LLM, SamplingParams |
|
|
import torch |
|
|
import os |
|
|
|
|
|
MODEL_NAME = "InfiX-ai/InfiR2-R1-7B-FP8-Preview" |
|
|
|
|
|
prompt_text = "Briefly explain what a black hole is, and provide two interesting facts." |
|
|
|
|
|
MAX_NEW_TOKENS = 256 |
|
|
TEMPERATURE = 0.8 |
|
|
DO_SAMPLE = True |
|
|
|
|
|
llm = LLM( |
|
|
model=MODEL_NAME, |
|
|
dtype="auto", |
|
|
) |
|
|
|
|
|
sampling_params = SamplingParams( |
|
|
n=1, |
|
|
temperature=TEMPERATURE, |
|
|
max_tokens=MAX_NEW_TOKENS, |
|
|
) |
|
|
|
|
|
tokenizer = llm.get_tokenizer() |
|
|
messages = [ |
|
|
{"role": "user", "content": prompt_text} |
|
|
] |
|
|
prompt_formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
|
|
|
|
|
outputs = llm.generate( |
|
|
prompt_formatted, |
|
|
sampling_params |
|
|
) |
|
|
|
|
|
generated_text = outputs[0].outputs[0].text |
|
|
|
|
|
llm_response = generated_text.strip() |
|
|
|
|
|
print("\n" + "="*70) |
|
|
print(f"Prompt: \n{prompt_text}") |
|
|
print("-" * 70) |
|
|
print(f"(LLM Response): \n{llm_response}") |
|
|
print("="*70) |
|
|
```` |
|
|
|
|
|
|
|
|
## π Model Download |
|
|
|
|
|
```bash |
|
|
# Create a directory for models |
|
|
mkdir -p ./models |
|
|
# Download InfiR2-R1-7B-FP8-Preview model |
|
|
huggingface-cli download --resume-download InfiX-ai/InfiR2-R1-7B-FP8-Preview --local-dir ./models/InfiR2-R1-7B-FP8-Preview |
|
|
``` |
|
|
|
|
|
|
|
|
## π― Intended Uses |
|
|
|
|
|
### β
Direct Use |
|
|
|
|
|
This model is intended for research and commercial use. Example use cases include: |
|
|
|
|
|
- Instruction following |
|
|
- Mathematical reasoning |
|
|
- Code generation |
|
|
- General reasoning |
|
|
|
|
|
### β Out-of-Scope Use |
|
|
|
|
|
The model should **not** be used for: |
|
|
|
|
|
- Generating harmful, offensive, or inappropriate content |
|
|
- Creating misleading information |
|
|
|
|
|
|
|
|
## π Acknowledgements |
|
|
|
|
|
* We would like to express our gratitude for the following open-source projects: [Slime](https://github.com/THUDM/slime), [Megatron](https://github.com/NVIDIA/Megatron-LM), [TransformerEngine](https://github.com/NVIDIA/TransformerEngine) and [Qwen2.5](https://github.com/QwenLM/Qwen2.5-Math). |
|
|
|
|
|
|
|
|
## π Citation |
|
|
|
|
|
If you find our work useful, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{wang2025infir2comprehensivefp8training, |
|
|
title={InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models}, |
|
|
author={Wenjun Wang and Shuo Cai and Congkai Xie and Mingfa Feng and Yiming Zhang and Zhen Li and Kejing Yang and Ming Li and Jiannong Cao and Hongxia Yang}, |
|
|
year={2025}, |
|
|
eprint={2509.22536}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CL}, |
|
|
url={[https://arxiv.org/abs/2509.22536](https://arxiv.org/abs/2509.22536)}, |
|
|
} |
|
|
``` |