InfiX-ai
/

InfiR2-R1-7B-FP8-Preview

Safetensors

qwen2

fp8

Model card Files Files and versions

xet

Community

baicaihaochi121 commited on Oct 15, 2025

Commit

7720292

verified ·

1 Parent(s): 552f334

Update README.md

Browse files

Files changed (1) hide show

README.md +33 -46

README.md CHANGED Viewed

@@ -2,7 +2,7 @@
 license: apache-2.0
 ---
-# InfiR2-R1-7B-FP8
 <p align="center">
   <a href="https://arxiv.org/abs/2509.22536">📄 Paper</a> &nbsp; | &nbsp;
@@ -10,52 +10,53 @@ license: apache-2.0
   <a href="https://infix-ai.com/research/infir2/">🌐 Project Website</a> &nbsp;
 </p>
-We performed **Reinforcement Learning (RL)** on the **InfiR2-7B-Instruct-FP8** model using the **dapo-math-17k** and the **FP8 format** (inference), with hyperparameters shown below.
-<div align="center">
-| Parameter(stage2) | Value |
-| :---: | :---: |
-| **Batch Size** | 128 |
-| **N Samples Per Prompt** | 16 |
-| **Global Batch Size** | 2048 |
-| **Maximum Response Length** | 16384 |
-| **Rollout Temperature** | 1.1 |
-| **Learning Rate** | 1e-6 |
-| **Weight Decay** | 0.1 |
-| **Eps Clip** | 0.2 |
-| **KL Loss Coefficient** | 0.00 |
-</div>
-The resulting model is the **InfiR2-R1-7B-FP8**.
 **Training Recipe**:
 <p align="center">
-    <img src="fp8_recipe.png" width="100%"/>
 <p>
 - Stable and Reproducible Performance
 - Efficient and Low memory Training
-## 🚀 InfiR2 Model Series
-The InfiR2 framework offers multiple variants model with different size and training strategy:
-- **1.5B**
-- [InfiR2-1.5B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-base-FP8): *Continue pretrain on Qwen2.5-1.5B-base*
-- [InfiR2-1.5B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-Instruct-FP8): *Supervised fine-tuning on InfiR2-1.5B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
-- **7B**
-- [InfiR2-7B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-base-FP8): *Continue pretrain on Qwen2.5-7B-base*
-- [InfiR2-7B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-Instruct-FP8): *Supervised fine-tuning on InfiR2-7B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
-- [InfiR2-R1-7B-FP8](https://huggingface.co/InfiX-ai/InfiR2-R1-7B-FP8): *Reinforcement learning on InfiR2-7B-Instruct-FP8 with dapo dataset*
-## 📊 Model Performance
-Below is the performance comparison of **InfiR2-R1-7B-FP8** on reasoning benchmarks. Note: 'w. InfiAlign' denotes Supervised Fine-Tuning (SFT) using the InfiAlign dataset.
 <div align="center">
@@ -77,20 +78,6 @@ Below is the performance comparison of **InfiR2-R1-7B-FP8** on reasoning benchma
       <td align="center">48.20</td>
       <td align="center">37.60</td>
     </tr>
-    <tr>
-      <td align="left"><strong>Qwen2.5-7B-base (w. InfiAlign)</strong></td>
-      <td align="center">33.75</td>
-      <td align="center">43.02</td>
-      <td align="center">48.11</td>
-      <td align="center">39.48</td>
-    </tr>
-    <tr>
-      <td align="left"><strong>InfiR2-7B-Instruct-FP8</strong></td>
-      <td align="center">40.62</td>
-      <td align="center">55.73</td>
-      <td align="center">45.33</td>
-      <td align="center">40.31</td>
-    </tr>
     <tr>
       <td align="left"><strong>InfiR2-R1-7B-FP8</strong></td>
       <td align="center"><strong>53.64</strong></td>
@@ -112,7 +99,7 @@ from vllm import LLM, SamplingParams
 import torch
 import os
-MODEL_NAME = "InfiX-ai/InfiR2-R1-7B-FP8"
 prompt_text = "Briefly explain what a black hole is, and provide two interesting facts."
@@ -159,8 +146,8 @@ print("="*70)
 ```bash
 # Create a directory for models
 mkdir -p ./models
-# Download InfiR2-R1-7B-FP8 model
-huggingface-cli download --resume-download InfiX-ai/InfiR2-R1-7B-FP8 --local-dir ./models/InfiR2-R1-7B-FP8
 ```

 license: apache-2.0
 ---
+# InfiR2-R1-7B-FP8-Preview
 <p align="center">
   <a href="https://arxiv.org/abs/2509.22536">📄 Paper</a> &nbsp; | &nbsp;
   <a href="https://infix-ai.com/research/infir2/">🌐 Project Website</a> &nbsp;
 </p>
+We performed multi-stage FP8 **Reinforcement Learning (RL)**. More experimental details will be released soon. Stay tuned!
+## 🚀 InfiR2 Model Series
+The InfiR2 framework offers multiple variants model with different size and training strategy:
+- **1.5B**
+- [InfiR2-1.5B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-base-FP8): *Continue pretrain on Qwen2.5-1.5B-base*
+- [InfiR2-1.5B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-Instruct-FP8): *Supervised fine-tuning on InfiR2-1.5B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
+- **7B**
+- [InfiR2-7B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-base-FP8): *Continue pretrain on Qwen2.5-7B-base*
+- [InfiR2-7B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-Instruct-FP8): *Supervised fine-tuning on InfiR2-7B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
+- [InfiR2-R1-7B-FP8-Preview](https://huggingface.co/InfiX-ai/InfiR2-R1-7B-FP8-Preview): *Multi-stage FP8 **Reinforcement Learning***
 **Training Recipe**:
 <p align="center">
+    <img src="fp8_recipe.png" width="60%"/>
 <p>
 - Stable and Reproducible Performance
 - Efficient and Low memory Training
+## 📊 Hyperparameters & Model Performance
+**Training hyperparameters**:
+<div align="center">
+| Parameter | Value |
+| :---: | :---: |
+| **Batch Size** | 128 |
+| **N Samples Per Prompt** | 16 |
+| **Global Batch Size** | 2048 |
+| **Maximum Response Length** | 16384 |
+| **Rollout Temperature** | 1.1 |
+| **Learning Rate** | 1e-6 |
+| **Weight Decay** | 0.1 |
+| **Eps Clip** | 0.2 |
+| **KL Loss Coefficient** | 0.00 |
+</div>
+Below is the performance comparison of **InfiR2-R1-7B-FP8-Preview** on reasoning benchmarks.
 <div align="center">
       <td align="center">48.20</td>
       <td align="center">37.60</td>
     </tr>
     <tr>
       <td align="left"><strong>InfiR2-R1-7B-FP8</strong></td>
       <td align="center"><strong>53.64</strong></td>
 import torch
 import os
+MODEL_NAME = "InfiX-ai/InfiR2-R1-7B-FP8-Preview"
 prompt_text = "Briefly explain what a black hole is, and provide two interesting facts."
 ```bash
 # Create a directory for models
 mkdir -p ./models
+# Download InfiR2-R1-7B-FP8-Preview model
+huggingface-cli download --resume-download InfiX-ai/InfiR2-R1-7B-FP8-Preview --local-dir ./models/InfiR2-R1-7B-FP8-Preview
 ```