Instructions to use microsoft/Phi-4-mini-reasoning with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use microsoft/Phi-4-mini-reasoning with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="microsoft/Phi-4-mini-reasoning")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-4-mini-reasoning")
model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-4-mini-reasoning")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use microsoft/Phi-4-mini-reasoning with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "microsoft/Phi-4-mini-reasoning"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "microsoft/Phi-4-mini-reasoning",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/microsoft/Phi-4-mini-reasoning

SGLang

How to use microsoft/Phi-4-mini-reasoning with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "microsoft/Phi-4-mini-reasoning" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "microsoft/Phi-4-mini-reasoning",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "microsoft/Phi-4-mini-reasoning" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "microsoft/Phi-4-mini-reasoning",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use microsoft/Phi-4-mini-reasoning with Docker Model Runner:
```
docker model run hf.co/microsoft/Phi-4-mini-reasoning
```

gargamit commited on Apr 30, 2025

Commit

9db0932

verified ·

1 Parent(s): 7196d37

Upload 2 files

Browse files

Files changed (3) hide show

.gitattributes +1 -0
Phi-4-Mini-Reasoning.pdf +3 -0
README.md +17 -9

.gitattributes CHANGED Viewed

@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text

 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
 tokenizer.json filter=lfs diff=lfs merge=lfs -text
+Phi-4-Mini-Reasoning.pdf filter=lfs diff=lfs merge=lfs -text

Phi-4-Mini-Reasoning.pdf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f4a8d862e83d76d77e7a2d17ecb6edef792c0efdc61a1b4085bc44f7a748e6f5
+size 654894

README.md CHANGED Viewed

@@ -20,15 +20,15 @@ widget:
 Phi-4-mini-reasoning is a lightweight open model built upon synthetic data with a focus on high-quality, reasoning dense data further finetuned for more advanced math reasoning capabilities.
 The model belongs to the Phi-4 model family and supports 128K token context length.
-📰 [Phi-4-mini-reasoning Microsoft Blog](https://aka.ms/phi4-feb2025) <br>
-📖 [Phi-4-mini-reasoning Technical Report](https://aka.ms/phi-4-multimodal/techreport) <br>
 👩‍🍳 [Phi Cookbook](https://github.com/microsoft/PhiCookBook) <br>
 🏡 [Phi Portal](https://azure.microsoft.com/en-us/products/phi) <br>
-🖥️ Try It [Azure](https://aka.ms/phi-4-mini/azure), [Huggingface](https://huggingface.co/spaces/microsoft/phi-4-mini) <br>
 🚀 [Model paper](https://huggingface.co/papers/2503.01743)
-🎉**Phi-4**: [[multimodal-instruct](https://huggingface.co/microsoft/Phi-4-multimodal-instruct) | [onnx](https://huggingface.co/microsoft/Phi-4-multimodal-instruct-onnx)];
 [[mini-instruct](https://huggingface.co/microsoft/Phi-4-mini-instruct) | [onnx](https://huggingface.co/microsoft/Phi-4-mini-instruct-onnx)]
 ## Intended Uses
@@ -95,18 +95,18 @@ This format is used for general conversation and instructions:
 ```
 ### Inference with transformers
-Phi-4-mini-reasoning has been integrated in the `4.49.0` version of `transformers`. The current `transformers` version can be verified with: `pip list | grep transformers`.
 Python 3.8 and 3.10 will work best.
 List of required packages:
 ```
 flash_attn==2.7.4.post1
 torch==2.5.1
-transformers==4.49.0
 accelerate==1.3.0
 ```
-Phi-4-mini-reasoning is also available in [Azure AI Studio]()
 #### Example
@@ -137,7 +137,13 @@ inputs = tokenizer.apply_chat_template(
     return_tensors="pt",
 )
-outputs = model.generate(**inputs.to(model.device), max_new_tokens=32768)
 outputs = tokenizer.batch_decode(outputs[:, inputs["input_ids"].shape[-1]:])
 print(outputs[0])
@@ -157,7 +163,7 @@ print(outputs[0])
 + **Dates:** Trained in February 2024<br>
 + **Status:** This is a static model trained on offline datasets with the cutoff date of February 2025 for publicly available data.<br>
 + **Supported languages:** English<br>
-+ **Release date:** May 2025<br>
 ### Training Datasets
@@ -186,6 +192,8 @@ If you want to run the model on:
 The Phi-4 family of models has adopted a robust safety post-training approach. This approach leverages a variety of both open-source and in-house generated datasets. The overall technique employed to do the safety alignment is a combination of SFT, DPO (Direct Preference Optimization), and RLHF (Reinforcement Learning from Human Feedback) approaches  by utilizing human-labeled and synthetic English-language datasets, including publicly available datasets focusing on helpfulness and harmlessness, as well as various questions and answers targeted to multiple safety categories.
 ## Responsible AI Considerations
 Like other language models, the Phi family of models can potentially behave in ways that are unfair, unreliable, or offensive. Some of the limiting behaviors to be aware of include:

 Phi-4-mini-reasoning is a lightweight open model built upon synthetic data with a focus on high-quality, reasoning dense data further finetuned for more advanced math reasoning capabilities.
 The model belongs to the Phi-4 model family and supports 128K token context length.
+📰 [Phi-4-mini-reasoning Blog](https://aka.ms/phi4-mini-reasoning/blog) <br>
+📖 [Phi-4-mini-reasoning Technical Report](https://aka.ms/phi4-mini-reasoning/techreport) <br>
 👩‍🍳 [Phi Cookbook](https://github.com/microsoft/PhiCookBook) <br>
 🏡 [Phi Portal](https://azure.microsoft.com/en-us/products/phi) <br>
+🖥️ Try It [Azure](https://aka.ms/phi4-mini-reasoning/azure) <br>
 🚀 [Model paper](https://huggingface.co/papers/2503.01743)
+🎉**Phi-4 models**: [[Phi-4-reasoning](https://huggingface.co/microsoft/Phi-4-reasoning)] | [[multimodal-instruct](https://huggingface.co/microsoft/Phi-4-multimodal-instruct) | [onnx](https://huggingface.co/microsoft/Phi-4-multimodal-instruct-onnx)];
 [[mini-instruct](https://huggingface.co/microsoft/Phi-4-mini-instruct) | [onnx](https://huggingface.co/microsoft/Phi-4-mini-instruct-onnx)]
 ## Intended Uses
 ```
 ### Inference with transformers
+Phi-4-mini-reasoning has been integrated in the `4.51.3` version of `transformers`. The current `transformers` version can be verified with: `pip list | grep transformers`.
 Python 3.8 and 3.10 will work best.
 List of required packages:
 ```
 flash_attn==2.7.4.post1
 torch==2.5.1
+transformers==4.51.3
 accelerate==1.3.0
 ```
+Phi-4-mini-reasoning is also available in [Azure AI Studio](https://aka.ms/phi-4-mini-reasoning/azure)
 #### Example
     return_tensors="pt",
 )
+outputs = model.generate(
+    **inputs.to(model.device),
+    max_new_tokens=32768,
+    temperature=0.8,
+    top_p=0.95,
+    do_sample=True,
+)
 outputs = tokenizer.batch_decode(outputs[:, inputs["input_ids"].shape[-1]:])
 print(outputs[0])
 + **Dates:** Trained in February 2024<br>
 + **Status:** This is a static model trained on offline datasets with the cutoff date of February 2025 for publicly available data.<br>
 + **Supported languages:** English<br>
++ **Release date:** April 2025<br>
 ### Training Datasets
 The Phi-4 family of models has adopted a robust safety post-training approach. This approach leverages a variety of both open-source and in-house generated datasets. The overall technique employed to do the safety alignment is a combination of SFT, DPO (Direct Preference Optimization), and RLHF (Reinforcement Learning from Human Feedback) approaches  by utilizing human-labeled and synthetic English-language datasets, including publicly available datasets focusing on helpfulness and harmlessness, as well as various questions and answers targeted to multiple safety categories.
+Phi-4-Mini-Reasoning was developed in accordance with Microsoft's responsible AI principles. Potential safety risks in the model’s responses were assessed using the Azure AI Foundry’s Risk and Safety Evaluation framework, focusing on harmful content, direct jailbreak, and model groundedness. The Phi-4-Mini-Reasoning Model Card contains additional information about our approach to safety and responsible AI considerations that developers should be aware of when using this model.
 ## Responsible AI Considerations
 Like other language models, the Phi family of models can potentially behave in ways that are unfair, unreliable, or offensive. Some of the limiting behaviors to be aware of include: