Text Generation
Transformers
Safetensors
English
phi3
nlp
math
code
conversational
text-generation-inference
Instructions to use microsoft/Phi-4-mini-reasoning with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use microsoft/Phi-4-mini-reasoning with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="microsoft/Phi-4-mini-reasoning") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-4-mini-reasoning") model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-4-mini-reasoning") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use microsoft/Phi-4-mini-reasoning with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "microsoft/Phi-4-mini-reasoning" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/Phi-4-mini-reasoning", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/microsoft/Phi-4-mini-reasoning
- SGLang
How to use microsoft/Phi-4-mini-reasoning with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "microsoft/Phi-4-mini-reasoning" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/Phi-4-mini-reasoning", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "microsoft/Phi-4-mini-reasoning" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "microsoft/Phi-4-mini-reasoning", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use microsoft/Phi-4-mini-reasoning with Docker Model Runner:
docker model run hf.co/microsoft/Phi-4-mini-reasoning
Upload 2 files
Browse files- .gitattributes +1 -0
- Phi-4-Mini-Reasoning.pdf +3 -0
- README.md +17 -9
.gitattributes
CHANGED
|
@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
Phi-4-Mini-Reasoning.pdf filter=lfs diff=lfs merge=lfs -text
|
Phi-4-Mini-Reasoning.pdf
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f4a8d862e83d76d77e7a2d17ecb6edef792c0efdc61a1b4085bc44f7a748e6f5
|
| 3 |
+
size 654894
|
README.md
CHANGED
|
@@ -20,15 +20,15 @@ widget:
|
|
| 20 |
Phi-4-mini-reasoning is a lightweight open model built upon synthetic data with a focus on high-quality, reasoning dense data further finetuned for more advanced math reasoning capabilities.
|
| 21 |
The model belongs to the Phi-4 model family and supports 128K token context length.
|
| 22 |
|
| 23 |
-
๐ฐ [Phi-4-mini-reasoning
|
| 24 |
-
๐ [Phi-4-mini-reasoning Technical Report](https://aka.ms/
|
| 25 |
๐ฉโ๐ณ [Phi Cookbook](https://github.com/microsoft/PhiCookBook) <br>
|
| 26 |
๐ก [Phi Portal](https://azure.microsoft.com/en-us/products/phi) <br>
|
| 27 |
-
๐ฅ๏ธ Try It [Azure](https://aka.ms/
|
| 28 |
|
| 29 |
๐ [Model paper](https://huggingface.co/papers/2503.01743)
|
| 30 |
|
| 31 |
-
๐**Phi-4**: [[multimodal-instruct](https://huggingface.co/microsoft/Phi-4-multimodal-instruct) | [onnx](https://huggingface.co/microsoft/Phi-4-multimodal-instruct-onnx)];
|
| 32 |
[[mini-instruct](https://huggingface.co/microsoft/Phi-4-mini-instruct) | [onnx](https://huggingface.co/microsoft/Phi-4-mini-instruct-onnx)]
|
| 33 |
|
| 34 |
## Intended Uses
|
|
@@ -95,18 +95,18 @@ This format is used for general conversation and instructions:
|
|
| 95 |
```
|
| 96 |
### Inference with transformers
|
| 97 |
|
| 98 |
-
Phi-4-mini-reasoning has been integrated in the `4.
|
| 99 |
Python 3.8 and 3.10 will work best.
|
| 100 |
List of required packages:
|
| 101 |
|
| 102 |
```
|
| 103 |
flash_attn==2.7.4.post1
|
| 104 |
torch==2.5.1
|
| 105 |
-
transformers==4.
|
| 106 |
accelerate==1.3.0
|
| 107 |
```
|
| 108 |
|
| 109 |
-
Phi-4-mini-reasoning is also available in [Azure AI Studio]()
|
| 110 |
|
| 111 |
#### Example
|
| 112 |
|
|
@@ -137,7 +137,13 @@ inputs = tokenizer.apply_chat_template(
|
|
| 137 |
return_tensors="pt",
|
| 138 |
)
|
| 139 |
|
| 140 |
-
outputs = model.generate(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 141 |
outputs = tokenizer.batch_decode(outputs[:, inputs["input_ids"].shape[-1]:])
|
| 142 |
|
| 143 |
print(outputs[0])
|
|
@@ -157,7 +163,7 @@ print(outputs[0])
|
|
| 157 |
+ **Dates:** Trained in February 2024<br>
|
| 158 |
+ **Status:** This is a static model trained on offline datasets with the cutoff date of February 2025 for publicly available data.<br>
|
| 159 |
+ **Supported languages:** English<br>
|
| 160 |
-
+ **Release date:**
|
| 161 |
|
| 162 |
### Training Datasets
|
| 163 |
|
|
@@ -186,6 +192,8 @@ If you want to run the model on:
|
|
| 186 |
|
| 187 |
The Phi-4 family of models has adopted a robust safety post-training approach. This approach leverages a variety of both open-source and in-house generated datasets. The overall technique employed to do the safety alignment is a combination of SFT, DPO (Direct Preference Optimization), and RLHF (Reinforcement Learning from Human Feedback) approaches by utilizing human-labeled and synthetic English-language datasets, including publicly available datasets focusing on helpfulness and harmlessness, as well as various questions and answers targeted to multiple safety categories.
|
| 188 |
|
|
|
|
|
|
|
| 189 |
## Responsible AI Considerations
|
| 190 |
|
| 191 |
Like other language models, the Phi family of models can potentially behave in ways that are unfair, unreliable, or offensive. Some of the limiting behaviors to be aware of include:
|
|
|
|
| 20 |
Phi-4-mini-reasoning is a lightweight open model built upon synthetic data with a focus on high-quality, reasoning dense data further finetuned for more advanced math reasoning capabilities.
|
| 21 |
The model belongs to the Phi-4 model family and supports 128K token context length.
|
| 22 |
|
| 23 |
+
๐ฐ [Phi-4-mini-reasoning Blog](https://aka.ms/phi4-mini-reasoning/blog) <br>
|
| 24 |
+
๐ [Phi-4-mini-reasoning Technical Report](https://aka.ms/phi4-mini-reasoning/techreport) <br>
|
| 25 |
๐ฉโ๐ณ [Phi Cookbook](https://github.com/microsoft/PhiCookBook) <br>
|
| 26 |
๐ก [Phi Portal](https://azure.microsoft.com/en-us/products/phi) <br>
|
| 27 |
+
๐ฅ๏ธ Try It [Azure](https://aka.ms/phi4-mini-reasoning/azure) <br>
|
| 28 |
|
| 29 |
๐ [Model paper](https://huggingface.co/papers/2503.01743)
|
| 30 |
|
| 31 |
+
๐**Phi-4 models**: [[Phi-4-reasoning](https://huggingface.co/microsoft/Phi-4-reasoning)] | [[multimodal-instruct](https://huggingface.co/microsoft/Phi-4-multimodal-instruct) | [onnx](https://huggingface.co/microsoft/Phi-4-multimodal-instruct-onnx)];
|
| 32 |
[[mini-instruct](https://huggingface.co/microsoft/Phi-4-mini-instruct) | [onnx](https://huggingface.co/microsoft/Phi-4-mini-instruct-onnx)]
|
| 33 |
|
| 34 |
## Intended Uses
|
|
|
|
| 95 |
```
|
| 96 |
### Inference with transformers
|
| 97 |
|
| 98 |
+
Phi-4-mini-reasoning has been integrated in the `4.51.3` version of `transformers`. The current `transformers` version can be verified with: `pip list | grep transformers`.
|
| 99 |
Python 3.8 and 3.10 will work best.
|
| 100 |
List of required packages:
|
| 101 |
|
| 102 |
```
|
| 103 |
flash_attn==2.7.4.post1
|
| 104 |
torch==2.5.1
|
| 105 |
+
transformers==4.51.3
|
| 106 |
accelerate==1.3.0
|
| 107 |
```
|
| 108 |
|
| 109 |
+
Phi-4-mini-reasoning is also available in [Azure AI Studio](https://aka.ms/phi-4-mini-reasoning/azure)
|
| 110 |
|
| 111 |
#### Example
|
| 112 |
|
|
|
|
| 137 |
return_tensors="pt",
|
| 138 |
)
|
| 139 |
|
| 140 |
+
outputs = model.generate(
|
| 141 |
+
**inputs.to(model.device),
|
| 142 |
+
max_new_tokens=32768,
|
| 143 |
+
temperature=0.8,
|
| 144 |
+
top_p=0.95,
|
| 145 |
+
do_sample=True,
|
| 146 |
+
)
|
| 147 |
outputs = tokenizer.batch_decode(outputs[:, inputs["input_ids"].shape[-1]:])
|
| 148 |
|
| 149 |
print(outputs[0])
|
|
|
|
| 163 |
+ **Dates:** Trained in February 2024<br>
|
| 164 |
+ **Status:** This is a static model trained on offline datasets with the cutoff date of February 2025 for publicly available data.<br>
|
| 165 |
+ **Supported languages:** English<br>
|
| 166 |
+
+ **Release date:** April 2025<br>
|
| 167 |
|
| 168 |
### Training Datasets
|
| 169 |
|
|
|
|
| 192 |
|
| 193 |
The Phi-4 family of models has adopted a robust safety post-training approach. This approach leverages a variety of both open-source and in-house generated datasets. The overall technique employed to do the safety alignment is a combination of SFT, DPO (Direct Preference Optimization), and RLHF (Reinforcement Learning from Human Feedback) approaches by utilizing human-labeled and synthetic English-language datasets, including publicly available datasets focusing on helpfulness and harmlessness, as well as various questions and answers targeted to multiple safety categories.
|
| 194 |
|
| 195 |
+
Phi-4-Mini-Reasoning was developed in accordance with Microsoft's responsible AI principles. Potential safety risks in the modelโs responses were assessed using the Azure AI Foundryโs Risk and Safety Evaluation framework, focusing on harmful content, direct jailbreak, and model groundedness. The Phi-4-Mini-Reasoning Model Card contains additional information about our approach to safety and responsible AI considerations that developers should be aware of when using this model.
|
| 196 |
+
|
| 197 |
## Responsible AI Considerations
|
| 198 |
|
| 199 |
Like other language models, the Phi family of models can potentially behave in ways that are unfair, unreliable, or offensive. Some of the limiting behaviors to be aware of include:
|