InfiX-ai
/

InfiR2-R1-7B-FP8-Preview

Model card Files Files and versions

juezhi commited on Oct 14, 2025

Commit

fa4b212

·

verified ·

1 Parent(s): 26e56a3

Update README.md

Files changed (1) hide show

README.md +85 -3

README.md CHANGED Viewed

@@ -1,3 +1,85 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+---
+## Introduction
+**InfiR2-R1-7B-FP8** is a model derived from the **InfiR2-7B-base-FP8**, obtained through Supervised Fine-Tuning (SFT) utilizing **FP8** and the **InfiAlign dataset**.
+## Model Download
+Download the InfiMed model from the Hugging Face Hub into the `./models` directory.
+```bash
+# Create a directory for models
+mkdir -p ./models
+# Download the R1 model
+huggingface-cli download --resume-download InfiX-ai/InfiR2-R1-7B-FP8 --local-dir ./models/InfiR2-R1-7B-FP8
+````
+## Quick Start
+```python
+import torch
+from transformers import AutoModelForCausalLM, AutoTokenizer
+MODEL_NAME = "InfiX-ai/InfiR2-R1-7B-FP8"
+prompt_text = "Briefly explain what a black hole is, and provide two interesting facts."
+MAX_NEW_TOKENS = 256
+TEMPERATURE = 0.8
+DO_SAMPLE = True
+tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
+device = "cuda" if torch.cuda.is_available() else "cpu"
+model = AutoModelForCausalLM.from_pretrained(
+    MODEL_NAME,
+    torch_dtype=torch.bfloat16 if device == "cuda" else None
+).to(device)
+messages = [
+    {"role": "user", "content": prompt_text}
+]
+input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(device)
+with torch.no_grad():
+    output_ids = model.generate(
+        input_ids,
+        max_new_tokens=MAX_NEW_TOKENS,
+        temperature=TEMPERATURE,
+        do_sample=DO_SAMPLE,
+        pad_token_id=tokenizer.eos_token_id
+    )
+generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
+response_start_index = generated_text.rfind(prompt_text) + len(prompt_text)
+llm_response = generated_text[response_start_index:].strip()
+print("\n" + "="*70)
+print(f"Prompt: \n{prompt_text}")
+print("-" * 70)
+print(f"(LLM Response): \n{llm_response}")
+print("="*70)
+```
+## Acknowledgements
+  * We would like to express our gratitude for the following open-source projects: [Slime](https://github.com/THUDM/slime), [Megatron](https://github.com/NVIDIA/Megatron-LM), [TransformerEngine](https://github.com/NVIDIA/TransformerEngine) and [Qwen2.5](https://github.com/QwenLM/Qwen2.5-Math).
+## Citation
+If you find our work useful, please cite:
+```bibtex
+@misc{wang2025infir2comprehensivefp8training,
+      title={InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models},
+      author={Wenjun Wang and Shuo Cai and Congkai Xie and Mingfa Feng and Yiming Zhang and Zhen Li and Kejing Yang and Ming Li and Jiannong Cao and Hongxia Yang},
+      year={2025},
+      eprint={2509.22536},
+      archivePrefix={arXiv},
+      primaryClass={cs.CL},
+      url={[https://arxiv.org/abs/2509.22536](https://arxiv.org/abs/2509.22536)},
+}
+```