Safetensors
qwen2
fp8
juezhi commited on
Commit
12bde89
Β·
verified Β·
1 Parent(s): fecb158

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +143 -37
README.md CHANGED
@@ -2,58 +2,138 @@
2
  license: apache-2.0
3
  ---
4
 
5
- ## Introduction
6
- **InfiR2-7B-Instruct-FP8** is Supervised Fine-Tuned (SFT) on its **InfiR2-7B-base-FP8** , utilizing FP8 and the InfiAlign dataset.
7
-
8
- ## Model Download
9
-
10
- ```bash
11
- # Create a directory for models
12
- mkdir -p ./models
13
- # Download the Instruct model
14
- huggingface-cli download --resume-download InfiX-ai/InfiR2-7B-Instruct-FP8 --local-dir ./models/InfiR2-7B-Instruct-FP8
15
- ````
16
-
17
- ## Quick Start
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
 
19
  ```python
 
20
  import torch
21
- from transformers import AutoModelForCausalLM, AutoTokenizer
22
 
23
  MODEL_NAME = "InfiX-ai/InfiR2-7B-Instruct-FP8"
24
 
25
  prompt_text = "Briefly explain what a black hole is, and provide two interesting facts."
26
 
27
- MAX_NEW_TOKENS = 256
28
- TEMPERATURE = 0.8
29
- DO_SAMPLE = True
30
 
31
- tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
 
 
 
32
 
33
- device = "cuda" if torch.cuda.is_available() else "cpu"
34
- model = AutoModelForCausalLM.from_pretrained(
35
- MODEL_NAME,
36
- torch_dtype=torch.bfloat16 if device == "cuda" else None
37
- ).to(device)
38
 
 
39
  messages = [
40
  {"role": "user", "content": prompt_text}
41
  ]
42
- input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(device)
43
 
44
- with torch.no_grad():
45
- output_ids = model.generate(
46
- input_ids,
47
- max_new_tokens=MAX_NEW_TOKENS,
48
- temperature=TEMPERATURE,
49
- do_sample=DO_SAMPLE,
50
- pad_token_id=tokenizer.eos_token_id
51
- )
52
 
53
- generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
54
 
55
- response_start_index = generated_text.rfind(prompt_text) + len(prompt_text)
56
- llm_response = generated_text[response_start_index:].strip()
57
 
58
  print("\n" + "="*70)
59
  print(f"Prompt: \n{prompt_text}")
@@ -62,11 +142,37 @@ print(f"(LLM Response): \n{llm_response}")
62
  print("="*70)
63
  ```
64
 
65
- ## Acknowledgements
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
66
 
67
  * We would like to express our gratitude for the following open-source projects: [Slime](https://github.com/THUDM/slime), [Megatron](https://github.com/NVIDIA/Megatron-LM), [TransformerEngine](https://github.com/NVIDIA/TransformerEngine) and [Qwen2.5](https://github.com/QwenLM/Qwen2.5-Math).
68
 
69
- ## Citation
70
 
71
  If you find our work useful, please cite:
72
 
 
2
  license: apache-2.0
3
  ---
4
 
5
+ # InfiR2-7B-Instruct-FP8
6
+
7
+ <p align="center">
8
+ Β  <a href="https://arxiv.org/abs/2509.22536">πŸ“„ Paper</a> &nbsp; | &nbsp;
9
+   <a href="https://infix-ai.com/research/infir2/">🌐 Project Website</a> &nbsp; | &nbsp;
10
+ </p>
11
+
12
+ We performed supervised fine-tuning on the **InfiR2-7B-base-FP8** with FP8 format in two stages using the InfiAlign-SFT-72k and InfiAlign-SFT-165k datasets, with hyperparameters shown in below.
13
+
14
+ <div align="center">
15
+
16
+ | Parameter | Value |
17
+ | :---: | :---: |
18
+ | **Batch Size** | 128 |
19
+ | **Learning Rate** | $1 \times 10^{-4}$ |
20
+ | **Minimum Learning Rate** | $1 \times 10^{-5}$ |
21
+ | **Weight Decay** | 0.1 |
22
+ | **Context Length** | 32k |
23
+
24
+ </div>
25
+
26
+ The resulting model is the **InfiR2-7B-Instruct-FP8**.
27
+
28
+
29
+ **Training Recipe**:
30
+ <p align="center">
31
+ <img src="fp8_recipe.png" width="80%"/>
32
+ <p>
33
+
34
+ - Stable and Reproducible Performance
35
+ - Efficient and Low memory Training
36
+
37
+
38
+
39
+ ## πŸš€ InfiR2 Model Series
40
+
41
+ The InfiR2 framework offers multiple variants model with different size and training strategy:
42
+
43
+ - **1.5B**
44
+ - [InfiR2-1.5B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-base-FP8): *Continue pretrain on Qwen2.5-1.5B-base*
45
+ - [InfiR2-1.5B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-1.5B-Instruct-FP8): *Supervised fine-tuning on InfiR2-1.5B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)
46
+ - **7B**
47
+ - [InfiR2-7B-base-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-base-FP8): *Continue pretrain on Qwen2.5-7B-base*
48
+ - [InfiR2-7B-Instruct-FP8](https://huggingface.co/InfiX-ai/InfiR2-7B-Instruct-FP8): *Supervised fine-tuning on InfiR2-7B-base-FP8 with [InfiAlign dataset](https://huggingface.co/papers/2508.05496)*
49
+ - [InfiR2-R1-7B-FP8](https://huggingface.co/InfiX-ai/InfiR2-R1-7B-FP8): *Reinforcement learning on InfiR2-7B-Instruct-FP8 with dapo dataset*
50
+
51
+ ## πŸ“Š Model Performance
52
+ Below is the performance comparison of InfiR2-7B-Instruct-FP8 on reasoning benchmarks. Note: 'w. InfiAlign' denotes Supervised Fine-Tuning (SFT) using the InfiAlign dataset.
53
+
54
+ </div>
55
+
56
+ <div align="center">
57
+
58
+ <table>
59
+ <thead>
60
+ <tr>
61
+ <th align="left">Model</th>
62
+ <th align="center">AIME 25</th>
63
+ <th align="center">AIME 24</th>
64
+ <th align="center">GPQA</th>
65
+ <th align="center">LiveCodeBench v5</th>
66
+ </tr>
67
+ </thead>
68
+ <tbody>
69
+ <tr>
70
+ <td align="left"><strong>Deepseek-Distill-Qwen-7B</strong></td>
71
+ <td align="center">43.00</td>
72
+ <td align="center">49.00</td>
73
+ <td align="center">48.20</td>
74
+ <td align="center">37.60</td>
75
+ </tr>
76
+ <tr>
77
+ <td align="left"><strong>Qwen2.5-7B-base (w. InfiAlign)</strong></td>
78
+ <td align="center">33.75</td>
79
+ <td align="center">43.02</td>
80
+ <td align="center">48.11</td>
81
+ <td align="center">39.48</td>
82
+ </tr>
83
+ <tr>
84
+ <td align="left"><strong>InfiR2-7B-Instruct-FP8</strong></td>
85
+ <td align="center">40.62</td>
86
+ <td align="center">55.73</td>
87
+ <td align="center">45.33</td>
88
+ <td align="center">40.31</td>
89
+ </tr>
90
+ </tr>
91
+ </tbody>
92
+ </table>
93
+
94
+ </div>
95
+
96
+
97
+ ## 🎭 Quick Start
98
 
99
  ```python
100
+ from vllm import LLM, SamplingParams
101
  import torch
102
+ import os
103
 
104
  MODEL_NAME = "InfiX-ai/InfiR2-7B-Instruct-FP8"
105
 
106
  prompt_text = "Briefly explain what a black hole is, and provide two interesting facts."
107
 
108
+ MAX_NEW_TOKENS = 256
109
+ TEMPERATURE = 0.8
110
+ DO_SAMPLE = True
111
 
112
+ llm = LLM(
113
+ model=MODEL_NAME,
114
+ dtype="auto",
115
+ )
116
 
117
+ sampling_params = SamplingParams(
118
+ n=1,
119
+ temperature=TEMPERATURE,
120
+ max_tokens=MAX_NEW_TOKENS,
121
+ )
122
 
123
+ tokenizer = llm.get_tokenizer()
124
  messages = [
125
  {"role": "user", "content": prompt_text}
126
  ]
127
+ prompt_formatted = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
128
 
129
+ outputs = llm.generate(
130
+ prompt_formatted,
131
+ sampling_params
132
+ )
 
 
 
 
133
 
134
+ generated_text = outputs[0].outputs[0].text
135
 
136
+ llm_response = generated_text.strip()
 
137
 
138
  print("\n" + "="*70)
139
  print(f"Prompt: \n{prompt_text}")
 
142
  print("="*70)
143
  ```
144
 
145
+ ## πŸ“š Model Download
146
+
147
+ ```bash
148
+ # Create a directory for models
149
+ mkdir -p ./models
150
+ # Download InfiR2-7B-Instruct-FP8 model
151
+ huggingface-cli download --resume-download InfiX-ai/InfiR2-7B-Instruct-FP8 --local-dir ./models/InfiR2-7B-Instruct-FP8
152
+ ```
153
+ ## 🎯 Intended Uses
154
+
155
+ ### βœ… Direct Use
156
+
157
+ This model is intended for research and commercial use. Example use cases include:
158
+
159
+ - Instruction following
160
+ - Mathematical reasoning
161
+ - Code generation
162
+ - General reasoning
163
+
164
+ ### ❌ Out-of-Scope Use
165
+
166
+ The model should **not** be used for:
167
+
168
+ - Generating harmful, offensive, or inappropriate content
169
+ - Creating misleading information
170
+
171
+ ## πŸ™ Acknowledgements
172
 
173
  * We would like to express our gratitude for the following open-source projects: [Slime](https://github.com/THUDM/slime), [Megatron](https://github.com/NVIDIA/Megatron-LM), [TransformerEngine](https://github.com/NVIDIA/TransformerEngine) and [Qwen2.5](https://github.com/QwenLM/Qwen2.5-Math).
174
 
175
+ ## πŸ“Œ Citation
176
 
177
  If you find our work useful, please cite:
178