InfiX-ai
/

InfiR2-7B-Instruct-FP8

Safetensors

qwen2

fp8

Model card Files Files and versions

xet

Community

juezhi commited on Oct 14, 2025

Commit

12bde89

verified ·

1 Parent(s): fecb158

Update README.md

Browse files

Files changed (1) hide show

README.md +143 -37

README.md CHANGED Viewed

@@ -2,58 +2,138 @@
 license: apache-2.0
 ---
-## Introduction
-**InfiR2-7B-Instruct-FP8** is Supervised Fine-Tuned (SFT) on its **InfiR2-7B-base-FP8** , utilizing FP8 and the InfiAlign dataset.
-## Model Download
-```bash
-# Create a directory for models
-mkdir -p ./models
-# Download the Instruct model
-huggingface-cli download --resume-download InfiX-ai/InfiR2-7B-Instruct-FP8 --local-dir ./models/InfiR2-7B-Instruct-FP8
-````
-## Quick Start
 ```python
 import torch
-from transformers import AutoModelForCausalLM, AutoTokenizer
 MODEL_NAME = "InfiX-ai/InfiR2-7B-Instruct-FP8"
 prompt_text = "Briefly explain what a black hole is, and provide two interesting facts."
-MAX_NEW_TOKENS = 256
-TEMPERATURE = 0.8
-DO_SAMPLE = True
-tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
-device = "cuda" if torch.cuda.is_available() else "cpu"
-model = AutoModelForCausalLM.from_pretrained(
-    MODEL_NAME,
-    torch_dtype=torch.bfloat16 if device == "cuda" else None
-).to(device)
 messages = [
     {"role": "user", "content": prompt_text}
 ]
-input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(device)
-with torch.no_grad():
-    output_ids = model.generate(
-        input_ids,
-        max_new_tokens=MAX_NEW_TOKENS,
-        temperature=TEMPERATURE,
-        do_sample=DO_SAMPLE,
-        pad_token_id=tokenizer.eos_token_id
-    )
-generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
-response_start_index = generated_text.rfind(prompt_text) + len(prompt_text)
-llm_response = generated_text[response_start_index:].strip()
 print("\n" + "="*70)
 print(f"Prompt: \n{prompt_text}")
@@ -62,11 +142,37 @@ print(f"(LLM Response): \n{llm_response}")
 print("="*70)
 ```
-## Acknowledgements
   * We would like to express our gratitude for the following open-source projects: [Slime](https://github.com/THUDM/slime), [Megatron](https://github.com/NVIDIA/Megatron-LM), [TransformerEngine](https://github.com/NVIDIA/TransformerEngine) and [Qwen2.5](https://github.com/QwenLM/Qwen2.5-Math).
-## Citation
 If you find our work useful, please cite:

 license: apache-2.0
 ---
+# InfiR2-7B-Instruct-FP8
+<p align="center">
+  <a href="https://arxiv.org/abs/2509.22536">📄 Paper</a> &nbsp; | &nbsp;
+  <a href="https://infix-ai.com/research/infir2/">🌐 Project Website</a> &nbsp; | &nbsp;
+</p>
+We performed supervised fine-tuning on the **InfiR2-7B-base-FP8** with FP8 format in two stages using the InfiAlign-SFT-72k and InfiAlign-SFT-165k datasets, with hyperparameters shown in below.
+<div align="center">
+| Parameter | Value |
+| :---: | :---: |
+| **Batch Size** | 128 |
+| **Learning Rate** | $1 \times 10^{-4}$ |
+| **Minimum Learning Rate** | $1 \times 10^{-5}$ |
+| **Weight Decay** | 0.1 |
+| **Context Length** | 32k |
+</div>
+The resulting model is the **InfiR2-7B-Instruct-FP8**.
+**Training Recipe**:
+<p align="center">
+    <img src="fp8_recipe.png" width="80%"/>
+<p>
+- Stable and Reproducible Performance
+- Efficient and Low memory Training
+## 🚀 InfiR2 Model Series
+The InfiR2 framework offers multiple variants model with different size and training strategy:
+- **1.5B**
+- [InfiR2-1.5B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-base-FP8): *Continue pretrain on Qwen2.5-1.5B-base*
+- [InfiR2-1.5B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-Instruct-FP8): *Supervised fine-tuning on InfiR2-1.5B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)
+- **7B**
+- [InfiR2-7B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-base-FP8): *Continue pretrain on Qwen2.5-7B-base*
+- [InfiR2-7B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-Instruct-FP8): *Supervised fine-tuning on InfiR2-7B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
+- [InfiR2-R1-7B-FP8](https://huggingface.co/InfiX-ai/InfiR2-R1-7B-FP8): *Reinforcement learning on InfiR2-7B-Instruct-FP8 with dapo dataset*
+## 📊 Model Performance
+Below is the performance comparison of InfiR2-7B-Instruct-FP8 on reasoning benchmarks. Note: 'w. InfiAlign' denotes Supervised Fine-Tuning (SFT) using the InfiAlign dataset.
+</div>
+<div align="center">
+<table>
+  <thead>
+    <tr>
+      <th align="left">Model</th>
+      <th align="center">AIME 25</th>
+      <th align="center">AIME 24</th>
+      <th align="center">GPQA</th>
+      <th align="center">LiveCodeBench v5</th>
+    </tr>
+  </thead>
+  <tbody>
+    <tr>
+      <td align="left"><strong>Deepseek-Distill-Qwen-7B</strong></td>
+      <td align="center">43.00</td>
+      <td align="center">49.00</td>
+      <td align="center">48.20</td>
+      <td align="center">37.60</td>
+    </tr>
+    <tr>
+      <td align="left"><strong>Qwen2.5-7B-base (w. InfiAlign)</strong></td>
+      <td align="center">33.75</td>
+      <td align="center">43.02</td>
+      <td align="center">48.11</td>
+      <td align="center">39.48</td>
+    </tr>
+    <tr>
+      <td align="left"><strong>InfiR2-7B-Instruct-FP8</strong></td>
+      <td align="center">40.62</td>
+      <td align="center">55.73</td>
+      <td align="center">45.33</td>
+      <td align="center">40.31</td>
+    </tr>
+    </tr>
+  </tbody>
+</table>
+</div>
+## 🎭 Quick Start
 ```python
+from vllm import LLM, SamplingParams
 import torch
+import os
 MODEL_NAME = "InfiX-ai/InfiR2-7B-Instruct-FP8"
 prompt_text = "Briefly explain what a black hole is, and provide two interesting facts."
+MAX_NEW_TOKENS = 256
+TEMPERATURE = 0.8
+DO_SAMPLE = True
+llm = LLM(
+    model=MODEL_NAME,
+    dtype="auto",
+)
+sampling_params = SamplingParams(
+    n=1,
+    temperature=TEMPERATURE,
+    max_tokens=MAX_NEW_TOKENS,
+)
+tokenizer = llm.get_tokenizer()
 messages = [
     {"role": "user", "content": prompt_text}
 ]
+prompt_formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+outputs = llm.generate(
+    prompt_formatted,
+    sampling_params
+)
+generated_text = outputs[0].outputs[0].text
+llm_response = generated_text.strip()
 print("\n" + "="*70)
 print(f"Prompt: \n{prompt_text}")
 print("="*70)
 ```
+## 📚 Model Download
+```bash
+# Create a directory for models
+mkdir -p ./models
+# Download InfiR2-7B-Instruct-FP8 model
+huggingface-cli download --resume-download InfiX-ai/InfiR2-7B-Instruct-FP8 --local-dir ./models/InfiR2-7B-Instruct-FP8
+```
+## 🎯 Intended Uses
+### ✅ Direct Use
+This model is intended for research and commercial use. Example use cases include:
+- Instruction following
+- Mathematical reasoning
+- Code generation
+- General reasoning
+### ❌ Out-of-Scope Use
+The model should **not** be used for:
+- Generating harmful, offensive, or inappropriate content
+- Creating misleading information
+## 🙏 Acknowledgements
   * We would like to express our gratitude for the following open-source projects: [Slime](https://github.com/THUDM/slime), [Megatron](https://github.com/NVIDIA/Megatron-LM), [TransformerEngine](https://github.com/NVIDIA/TransformerEngine) and [Qwen2.5](https://github.com/QwenLM/Qwen2.5-Math).
+## 📌 Citation
 If you find our work useful, please cite: