Instructions to use microsoft/Phi-4-reasoning with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use microsoft/Phi-4-reasoning with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="microsoft/Phi-4-reasoning")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-4-reasoning")
model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-4-reasoning")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use microsoft/Phi-4-reasoning with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "microsoft/Phi-4-reasoning"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "microsoft/Phi-4-reasoning",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/microsoft/Phi-4-reasoning

SGLang

How to use microsoft/Phi-4-reasoning with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "microsoft/Phi-4-reasoning" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "microsoft/Phi-4-reasoning",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "microsoft/Phi-4-reasoning" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "microsoft/Phi-4-reasoning",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use microsoft/Phi-4-reasoning with Docker Model Runner:
```
docker model run hf.co/microsoft/Phi-4-reasoning
```

Extraction_v1

by imonban - opened May 1, 2025

base: refs/heads/main

←

from: refs/pr/3

Discussion Files changed

+88

-215

Files changed (6) hide show

README.md +52 -54
data_summary_card.md +0 -149
generation_config.json +0 -1
special_tokens_map.json +7 -0
tokenizer.json +14 -5
tokenizer_config.json +15 -6

README.md CHANGED Viewed

@@ -26,7 +26,7 @@ library_name: transformers
 # Phi-4-reasoning Model Card
-[Phi-4-reasoning Technical Report](https://huggingface.co/papers/2504.21318)
 ## Model Summary
@@ -53,56 +53,6 @@ library_name: transformers
 | **Primary Use Cases**         | Our model is designed to accelerate research on language models, for use as a building block for generative AI powered features. It provides uses for general purpose AI systems and applications (primarily in English) which require:<br><br>1. Memory/compute constrained environments.<br>2. Latency bound scenarios.<br>3. Reasoning and logic.                                                                                                    |
 | **Out-of-Scope Use Cases**    | This model is designed and tested for math reasoning only. Our models are not specifically designed or evaluated for all downstream purposes. Developers should consider common limitations of language models as they select use cases, and evaluate and mitigate for accuracy, safety, and fairness before using within a specific downstream use case, particularly for high-risk scenarios. Developers should be aware of and adhere to applicable laws or regulations (including privacy, trade compliance laws, etc.) that are relevant to their use case, including the model’s focus on English. Review the Responsible AI Considerations section below for further guidance when choosing a use case. Nothing contained in this Model Card should be interpreted as or deemed a restriction or modification to the license the model is released under.                                                                                                     |
-## Usage
-> [!IMPORTANT]
-> To fully take advantage of the model's capabilities, inference must use `temperature=0.8`, `top_k=50`, `top_p=0.95`, and `do_sample=True`. For more complex queries, set `max_new_tokens=32768` to allow for longer chain-of-thought (CoT).
-### Input Formats
-Given the nature of the training data, **always use** ChatML template with the **following system prompt** for inference:
-```bash
-<|im_start|>system<|im_sep|>
-You are Phi, a language model trained by Microsoft to help users. Your role as an assistant involves thoroughly exploring questions through a systematic thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution using the specified format: <think> {Thought section} </think> {Solution section}. In the Thought section, detail your reasoning process in steps. Each step should include detailed considerations such as analysing questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The Solution section should be logical, accurate, and concise and detail necessary steps needed to reach the conclusion. Now, try to solve the following question through the above guidelines:<|im_end|>
-<|im_start|>user<|im_sep|>
-What is the derivative of x^2?<|im_end|>
-<|im_start|>assistant<|im_sep|>
-```
-### With `transformers`
-```python
-from transformers import AutoTokenizer, AutoModelForCausalLM
-tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-4-reasoning")
-model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-4-reasoning", device_map="auto", torch_dtype="auto")
-messages = [
-    {"role": "system", "content": "You are Phi, a language model trained by Microsoft to help users. Your role as an assistant involves thoroughly exploring questions through a systematic thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution using the specified format: <think> {Thought section} </think> {Solution section}. In the Thought section, detail your reasoning process in steps. Each step should include detailed considerations such as analysing questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The Solution section should be logical, accurate, and concise and detail necessary steps needed to reach the conclusion. Now, try to solve the following question through the above guidelines:"},
-    {"role": "user", "content": "What is the derivative of x^2?"},
-]
-inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
-outputs = model.generate(
-    inputs.to(model.device),
-    max_new_tokens=4096,
-    temperature=0.8,
-    top_k=50,
-    top_p=0.95,
-    do_sample=True,
-)
-print(tokenizer.decode(outputs[0]))
-```
-### With `vllm`
-```bash
-vllm serve microsoft/Phi-4-reasoning --enable-reasoning --reasoning-parser deepseek_r1
-```
-*Phi-4-reasoning is also supported out-of-the-box by Ollama, llama.cpp, and any Phi-4 compatible framework.*
 ## Data Overview
 ### Training Datasets
@@ -187,6 +137,56 @@ At the high-level overview of the model quality on representative benchmarks. Fo
 Overall, Phi-4-reasoning, with only 14B parameters, performs well across a wide range of reasoning tasks, outperforming significantly larger open-weight models such as DeepSeek-R1 distilled 70B model and approaching the performance levels of full DeepSeek R1 model. We also test the models on multiple new reasoning benchmarks for algorithmic problem solving and planning, including 3SAT, TSP, and BA-Calendar. These new tasks are nominally out-of-domain for the models as the training process did not intentionally target these skills, but the models still show strong generalization to these tasks. Furthermore, when evaluating performance against standard general abilities benchmarks such as instruction following or non-reasoning tasks, we find that our new models improve significantly from Phi-4, despite the post-training being focused on reasoning skills in specific domains.
 ## Responsible AI Considerations
 Like other language models, Phi-4-reasoning can potentially behave in ways that are unfair, unreliable, or offensive. Some of the limiting behaviors to be aware of include:
@@ -213,6 +213,4 @@ Developers should apply responsible AI best practices and are responsible for en
 * **Generation of Harmful Content:** Developers should assess outputs for their context and use available safety classifiers or custom solutions appropriate for their use case.
-* **Misuse:** Other forms of misuse such as fraud, spam, or malware production may be possible, and developers should ensure that their applications do not violate applicable laws and regulations.
-* **Data Summary:** https://huggingface.co/microsoft/Phi-4-reasoning/blob/main/data_summary_card.md

 # Phi-4-reasoning Model Card
+[Phi-4-reasoning Technical Report](https://aka.ms/phi-reasoning/techreport)
 ## Model Summary
 | **Primary Use Cases**         | Our model is designed to accelerate research on language models, for use as a building block for generative AI powered features. It provides uses for general purpose AI systems and applications (primarily in English) which require:<br><br>1. Memory/compute constrained environments.<br>2. Latency bound scenarios.<br>3. Reasoning and logic.                                                                                                    |
 | **Out-of-Scope Use Cases**    | This model is designed and tested for math reasoning only. Our models are not specifically designed or evaluated for all downstream purposes. Developers should consider common limitations of language models as they select use cases, and evaluate and mitigate for accuracy, safety, and fairness before using within a specific downstream use case, particularly for high-risk scenarios. Developers should be aware of and adhere to applicable laws or regulations (including privacy, trade compliance laws, etc.) that are relevant to their use case, including the model’s focus on English. Review the Responsible AI Considerations section below for further guidance when choosing a use case. Nothing contained in this Model Card should be interpreted as or deemed a restriction or modification to the license the model is released under.                                                                                                     |
 ## Data Overview
 ### Training Datasets
 Overall, Phi-4-reasoning, with only 14B parameters, performs well across a wide range of reasoning tasks, outperforming significantly larger open-weight models such as DeepSeek-R1 distilled 70B model and approaching the performance levels of full DeepSeek R1 model. We also test the models on multiple new reasoning benchmarks for algorithmic problem solving and planning, including 3SAT, TSP, and BA-Calendar. These new tasks are nominally out-of-domain for the models as the training process did not intentionally target these skills, but the models still show strong generalization to these tasks. Furthermore, when evaluating performance against standard general abilities benchmarks such as instruction following or non-reasoning tasks, we find that our new models improve significantly from Phi-4, despite the post-training being focused on reasoning skills in specific domains.
+## Usage
+### Inference Parameters
+Inference is better with `temperature=0.8`, `top_p=0.95`, and `do_sample=True`. For more complex queries, set the maximum number of tokens to 32k to allow for longer chain-of-thought (CoT).
+### Input Formats
+Given the nature of the training data, always use ChatML template with the following system prompt for inference:
+```bash
+<|im_start|>system<|im_sep|>
+Your role as an assistant involves thoroughly exploring questions through a systematic thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution using the specified format: <think> {Thought section} <\think> {Solution section}. In the Thought section, detail your reasoning process in steps. Each step should include detailed considerations such as analysing questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The Solution section should be logical, accurate, and concise and detail necessary steps needed to reach the conclusion. Now, try to solve the following question through the above guidelines:<|im_end|>
+<|im_start|>user<|im_sep|>
+What is the derivative of x^2?<|im_end|>
+<|im_start|>assistant<|im_sep|>
+```
+### With `transformers`
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-4-reasoning")
+model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-4-reasoning", device_map="auto", torch_dtype="auto")
+messages = [
+    {"role": "system", "content": "You are Phi, a language model trained by Microsoft to help users. Your role as an assistant involves thoroughly exploring questions through a systematic thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution using the specified format: <think> {Thought section} </think> {Solution section}. In the Thought section, detail your reasoning process in steps. Each step should include detailed considerations such as analysing questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The Solution section should be logical, accurate, and concise and detail necessary steps needed to reach the conclusion. Now, try to solve the following question through the above guidelines:"},
+    {"role": "user", "content": "What is the derivative of x^2?"},
+]
+inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
+outputs = model.generate(
+    inputs.to(model.device),
+    max_new_tokens=4096,
+    temperature=0.8,
+    top_p=0.95,
+    do_sample=True,
+)
+print(tokenizer.decode(outputs[0]))
+```
+### With `vllm`
+```bash
+vllm serve microsoft/Phi-4-reasoning --enable-reasoning --reasoning-parser deepseek_r1
+```
+*Phi-4-reasoning is also supported out-of-the-box by Ollama, llama.cpp, and any Phi-4 compatible framework.*
 ## Responsible AI Considerations
 Like other language models, Phi-4-reasoning can potentially behave in ways that are unfair, unreliable, or offensive. Some of the limiting behaviors to be aware of include:
 * **Generation of Harmful Content:** Developers should assess outputs for their context and use available safety classifiers or custom solutions appropriate for their use case.
+* **Misuse:** Other forms of misuse such as fraud, spam, or malware production may be possible, and developers should ensure that their applications do not violate applicable laws and regulations.

data_summary_card.md DELETED Viewed

@@ -1,149 +0,0 @@
-# Data Summary for microsoft_Phi-4-reasoning
-## 1. General information
-**1.0.1 Version of the Summary:** 1.0
-**1.0.2 Last update:** 24-Nov-2025
-## 1.1 Model Developer Identification
-**1.1.1 Model Developer name and contact details:** Microsoft Corporation at One Microsoft Way, Redmond, WA 98052. Tel: 425-882-8080
-## 1.2 Model Identification
-**1.2.1 Versioned model name(s):** Phi-4-reasoning
-**1.2.2 Model release date:** 30-Apr-2025
-## 1.3 Overall training data size and characteristics
-### 1.3.1 Size of dataset and characteristics
-**1.3.1.A Text training data size:** 1 billion to 1 trillion tokens
-**1.3.1.B Text training data content:** Prompts sourced from publicly available websites, existing datasets, and licensed collections, augmented with synthetically generated problems; responses generated using o3-mini including chain-of-thought traces; includes STEM, coding, logical puzzles, and safety/Responsible AI alignment data
-**1.3.1.C Image training data size:** Not applicable. Images are not part of the training data
-**1.3.1.D Image training data content:** Not applicable
-**1.3.1.E Audio training data size:** Not applicable. Audio data is not part of the training data
-**1.3.1.F Audio training data content:** Not applicable
-**1.3.1.G Video training data size:** Not applicable. Video data is not part of the training data
-**1.3.1.H Video training data content:** Not applicable
-**1.3.1.I Other training data size:** Not applicable
-**1.3.1.J Other training data content:** Not applicable
-**1.3.2 Latest date of data acquisition/collection for model training:** 31-Mar-2025
-**1.3.3 Is data collection ongoing to update the model with new data collection after deployment?** No
-**1.3.4 Date the training dataset was first used to train the model:** 01-Jan-2025
-**1.3.5 Rationale or purpose of data selection:** Datasets were curated to emphasize complex multi-step reasoning and verifiable solutions across STEM, coding, and safety, selecting prompts at the boundary of base model capabilities. Synthetic problems and teacher-generated reasoning traces were used to distill structured chain-of-thought and promote concise, checkable answers, supporting robust reasoning performance and generalization to broader tasks
-## 2. List of data sources
-### 2.1 Publicly available datasets
-**2.1.1 Have you used publicly available datasets to train the model?** Yes
-## 2.2 Private non-publicly available datasets obtained from third parties
-### 2.2.1 Datasets commercially licensed by rights holders or their representatives
-**2.2.1.A Have you concluded transactional commercial licensing agreement(s) with rights holder(s) or with their representatives?** Yes
-### 2.2.2 Private datasets obtained from other third-parties
-**2.2.2.A Have you obtained private datasets from third parties that are not licensed as described in Section 2.2.1, such as data obtained from providers of private databases, or data intermediaries?** This information cannot be provided due to unavailability of the underlying data (e.g., loss, corruption, or other access limitations)
-## 2.3 Personal Information
-**2.3.1 Was personal data used to train the model?** Microsoft follows all relevant laws and regulations pertaining to personal information
-## 2.4 Synthetic data
-**2.4.1 Was any synthetic AI-generated data used to train the model?** Yes
-## 3. Data processing aspects
-### 3.1 Respect of reservation of rights from text and data mining exception or limitation
-**3.1.1 Does this dataset include any data protected by copyright, trademark, or patent?** Microsoft follows all required regulations and laws for processing data protected by copyright, trademark, or patent
-## 3.2 Other information
-**3.2.1 Does the dataset include information about consumer groups without revealing individual consumer identities?** Microsoft follows all required regulations and laws for protecting consumer identities
-**3.2.2 Was the dataset cleaned or modified before model training?** Yes

generation_config.json CHANGED Viewed

@@ -5,7 +5,6 @@
   "eos_token_id": 100265,
   "pad_token_id": 100349,
   "temperature": 0.8,
-  "top_k": 50,
   "top_p": 0.95,
   "transformers_version": "4.51.1"
 }

   "eos_token_id": 100265,
   "pad_token_id": 100349,
   "temperature": 0.8,
   "top_p": 0.95,
   "transformers_version": "4.51.1"
 }

special_tokens_map.json CHANGED Viewed

@@ -19,5 +19,12 @@
     "normalized": false,
     "rstrip": true,
     "single_word": false
   }
 }

     "normalized": false,
     "rstrip": true,
     "single_word": false
+  },
+  "unk_token": {
+    "content": "ï¿½",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
   }
 }

tokenizer.json CHANGED Viewed

@@ -3,6 +3,15 @@
   "truncation": null,
   "padding": null,
   "added_tokens": [
     {
       "id": 100256,
       "content": "<|dummy_0|>",
@@ -28,7 +37,7 @@
       "lstrip": true,
       "rstrip": true,
       "normalized": false,
-      "special": false
     },
     {
       "id": 100259,
@@ -37,7 +46,7 @@
       "lstrip": true,
       "rstrip": true,
       "normalized": false,
-      "special": false
     },
     {
       "id": 100260,
@@ -46,7 +55,7 @@
       "lstrip": true,
       "rstrip": true,
       "normalized": false,
-      "special": false
     },
     {
       "id": 100261,
@@ -856,7 +865,7 @@
       "lstrip": true,
       "rstrip": true,
       "normalized": false,
-      "special": false
     },
     {
       "id": 100351,
@@ -865,7 +874,7 @@
       "lstrip": true,
       "rstrip": true,
       "normalized": false,
-      "special": false
     }
   ],
   "normalizer": null,

   "truncation": null,
   "padding": null,
   "added_tokens": [
+    {
+      "id": 5809,
+      "content": "ï¿½",
+      "single_word": false,
+      "lstrip": false,
+      "rstrip": false,
+      "normalized": false,
+      "special": true
+    },
     {
       "id": 100256,
       "content": "<|dummy_0|>",
       "lstrip": true,
       "rstrip": true,
       "normalized": false,
+      "special": true
     },
     {
       "id": 100259,
       "lstrip": true,
       "rstrip": true,
       "normalized": false,
+      "special": true
     },
     {
       "id": 100260,
       "lstrip": true,
       "rstrip": true,
       "normalized": false,
+      "special": true
     },
     {
       "id": 100261,
       "lstrip": true,
       "rstrip": true,
       "normalized": false,
+      "special": true
     },
     {
       "id": 100351,
       "lstrip": true,
       "rstrip": true,
       "normalized": false,
+      "special": true
     }
   ],
   "normalizer": null,

tokenizer_config.json CHANGED Viewed

@@ -1,6 +1,14 @@
 {
   "add_prefix_space": false,
   "added_tokens_decoder": {
     "100256": {
       "content": "<|dummy_0|>",
       "lstrip": true,
@@ -23,7 +31,7 @@
       "normalized": false,
       "rstrip": true,
       "single_word": false,
-      "special": false
     },
     "100259": {
       "content": "<|fim_middle|>",
@@ -31,7 +39,7 @@
       "normalized": false,
       "rstrip": true,
       "single_word": false,
-      "special": false
     },
     "100260": {
       "content": "<|fim_suffix|>",
@@ -39,7 +47,7 @@
       "normalized": false,
       "rstrip": true,
       "single_word": false,
-      "special": false
     },
     "100261": {
       "content": "<|dummy_1|>",
@@ -759,7 +767,7 @@
       "normalized": false,
       "rstrip": true,
       "single_word": false,
-      "special": false
     },
     "100351": {
       "content": "</think>",
@@ -767,7 +775,7 @@
       "normalized": false,
       "rstrip": true,
       "single_word": false,
-      "special": false
     }
   },
   "bos_token": "<|endoftext|>",
@@ -778,5 +786,6 @@
   "model_max_length": 32768,
   "pad_token": "<|dummy_85|>",
   "padding_side": "left",
-  "tokenizer_class": "GPT2Tokenizer"
 }

 {
   "add_prefix_space": false,
   "added_tokens_decoder": {
+    "5809": {
+      "content": "ï¿½",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
     "100256": {
       "content": "<|dummy_0|>",
       "lstrip": true,
       "normalized": false,
       "rstrip": true,
       "single_word": false,
+      "special": true
     },
     "100259": {
       "content": "<|fim_middle|>",
       "normalized": false,
       "rstrip": true,
       "single_word": false,
+      "special": true
     },
     "100260": {
       "content": "<|fim_suffix|>",
       "normalized": false,
       "rstrip": true,
       "single_word": false,
+      "special": true
     },
     "100261": {
       "content": "<|dummy_1|>",
       "normalized": false,
       "rstrip": true,
       "single_word": false,
+      "special": true
     },
     "100351": {
       "content": "</think>",
       "normalized": false,
       "rstrip": true,
       "single_word": false,
+      "special": true
     }
   },
   "bos_token": "<|endoftext|>",
   "model_max_length": 32768,
   "pad_token": "<|dummy_85|>",
   "padding_side": "left",
+  "tokenizer_class": "GPT2Tokenizer",
+  "unk_token": "ï¿½"
 }