Safetensors
qwen2
fp8
juezhi commited on
Commit
fa4b212
·
verified ·
1 Parent(s): 26e56a3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +85 -3
README.md CHANGED
@@ -1,3 +1,85 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+
5
+
6
+ ## Introduction
7
+ **InfiR2-R1-7B-FP8** is a model derived from the **InfiR2-7B-base-FP8**, obtained through Supervised Fine-Tuning (SFT) utilizing **FP8** and the **InfiAlign dataset**.
8
+
9
+ ## Model Download
10
+ Download the InfiMed model from the Hugging Face Hub into the `./models` directory.
11
+
12
+ ```bash
13
+ # Create a directory for models
14
+ mkdir -p ./models
15
+ # Download the R1 model
16
+ huggingface-cli download --resume-download InfiX-ai/InfiR2-R1-7B-FP8 --local-dir ./models/InfiR2-R1-7B-FP8
17
+ ````
18
+
19
+ ## Quick Start
20
+
21
+ ```python
22
+ import torch
23
+ from transformers import AutoModelForCausalLM, AutoTokenizer
24
+
25
+ MODEL_NAME = "InfiX-ai/InfiR2-R1-7B-FP8"
26
+
27
+ prompt_text = "Briefly explain what a black hole is, and provide two interesting facts."
28
+
29
+ MAX_NEW_TOKENS = 256
30
+ TEMPERATURE = 0.8
31
+ DO_SAMPLE = True
32
+
33
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
34
+
35
+ device = "cuda" if torch.cuda.is_available() else "cpu"
36
+ model = AutoModelForCausalLM.from_pretrained(
37
+ MODEL_NAME,
38
+ torch_dtype=torch.bfloat16 if device == "cuda" else None
39
+ ).to(device)
40
+
41
+ messages = [
42
+ {"role": "user", "content": prompt_text}
43
+ ]
44
+ input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(device)
45
+
46
+ with torch.no_grad():
47
+ output_ids = model.generate(
48
+ input_ids,
49
+ max_new_tokens=MAX_NEW_TOKENS,
50
+ temperature=TEMPERATURE,
51
+ do_sample=DO_SAMPLE,
52
+ pad_token_id=tokenizer.eos_token_id
53
+ )
54
+
55
+ generated_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)
56
+
57
+ response_start_index = generated_text.rfind(prompt_text) + len(prompt_text)
58
+ llm_response = generated_text[response_start_index:].strip()
59
+
60
+ print("\n" + "="*70)
61
+ print(f"Prompt: \n{prompt_text}")
62
+ print("-" * 70)
63
+ print(f"(LLM Response): \n{llm_response}")
64
+ print("="*70)
65
+ ```
66
+
67
+ ## Acknowledgements
68
+
69
+ * We would like to express our gratitude for the following open-source projects: [Slime](https://github.com/THUDM/slime), [Megatron](https://github.com/NVIDIA/Megatron-LM), [TransformerEngine](https://github.com/NVIDIA/TransformerEngine) and [Qwen2.5](https://github.com/QwenLM/Qwen2.5-Math).
70
+
71
+ ## Citation
72
+
73
+ If you find our work useful, please cite:
74
+
75
+ ```bibtex
76
+ @misc{wang2025infir2comprehensivefp8training,
77
+ title={InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models},
78
+ author={Wenjun Wang and Shuo Cai and Congkai Xie and Mingfa Feng and Yiming Zhang and Zhen Li and Kejing Yang and Ming Li and Jiannong Cao and Hongxia Yang},
79
+ year={2025},
80
+ eprint={2509.22536},
81
+ archivePrefix={arXiv},
82
+ primaryClass={cs.CL},
83
+ url={[https://arxiv.org/abs/2509.22536](https://arxiv.org/abs/2509.22536)},
84
+ }
85
+ ```