README.md CHANGED
@@ -26,7 +26,7 @@ library_name: transformers
26
 
27
  # Phi-4-reasoning Model Card
28
 
29
- [Phi-4-reasoning Technical Report](https://huggingface.co/papers/2504.21318)
30
 
31
  ## Model Summary
32
 
@@ -53,56 +53,6 @@ library_name: transformers
53
  | **Primary Use Cases** | Our model is designed to accelerate research on language models, for use as a building block for generative AI powered features. It provides uses for general purpose AI systems and applications (primarily in English) which require:<br><br>1. Memory/compute constrained environments.<br>2. Latency bound scenarios.<br>3. Reasoning and logic. |
54
  | **Out-of-Scope Use Cases** | This model is designed and tested for math reasoning only. Our models are not specifically designed or evaluated for all downstream purposes. Developers should consider common limitations of language models as they select use cases, and evaluate and mitigate for accuracy, safety, and fairness before using within a specific downstream use case, particularly for high-risk scenarios. Developers should be aware of and adhere to applicable laws or regulations (including privacy, trade compliance laws, etc.) that are relevant to their use case, including the model’s focus on English. Review the Responsible AI Considerations section below for further guidance when choosing a use case. Nothing contained in this Model Card should be interpreted as or deemed a restriction or modification to the license the model is released under. |
55
 
56
- ## Usage
57
-
58
- > [!IMPORTANT]
59
- > To fully take advantage of the model's capabilities, inference must use `temperature=0.8`, `top_k=50`, `top_p=0.95`, and `do_sample=True`. For more complex queries, set `max_new_tokens=32768` to allow for longer chain-of-thought (CoT).
60
-
61
- ### Input Formats
62
-
63
- Given the nature of the training data, **always use** ChatML template with the **following system prompt** for inference:
64
-
65
- ```bash
66
- <|im_start|>system<|im_sep|>
67
- You are Phi, a language model trained by Microsoft to help users. Your role as an assistant involves thoroughly exploring questions through a systematic thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution using the specified format: <think> {Thought section} </think> {Solution section}. In the Thought section, detail your reasoning process in steps. Each step should include detailed considerations such as analysing questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The Solution section should be logical, accurate, and concise and detail necessary steps needed to reach the conclusion. Now, try to solve the following question through the above guidelines:<|im_end|>
68
- <|im_start|>user<|im_sep|>
69
- What is the derivative of x^2?<|im_end|>
70
- <|im_start|>assistant<|im_sep|>
71
- ```
72
-
73
- ### With `transformers`
74
-
75
- ```python
76
- from transformers import AutoTokenizer, AutoModelForCausalLM
77
-
78
- tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-4-reasoning")
79
- model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-4-reasoning", device_map="auto", torch_dtype="auto")
80
-
81
- messages = [
82
- {"role": "system", "content": "You are Phi, a language model trained by Microsoft to help users. Your role as an assistant involves thoroughly exploring questions through a systematic thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution using the specified format: <think> {Thought section} </think> {Solution section}. In the Thought section, detail your reasoning process in steps. Each step should include detailed considerations such as analysing questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The Solution section should be logical, accurate, and concise and detail necessary steps needed to reach the conclusion. Now, try to solve the following question through the above guidelines:"},
83
- {"role": "user", "content": "What is the derivative of x^2?"},
84
- ]
85
- inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
86
-
87
- outputs = model.generate(
88
- inputs.to(model.device),
89
- max_new_tokens=4096,
90
- temperature=0.8,
91
- top_k=50,
92
- top_p=0.95,
93
- do_sample=True,
94
- )
95
- print(tokenizer.decode(outputs[0]))
96
- ```
97
-
98
- ### With `vllm`
99
-
100
- ```bash
101
- vllm serve microsoft/Phi-4-reasoning --enable-reasoning --reasoning-parser deepseek_r1
102
- ```
103
-
104
- *Phi-4-reasoning is also supported out-of-the-box by Ollama, llama.cpp, and any Phi-4 compatible framework.*
105
-
106
  ## Data Overview
107
 
108
  ### Training Datasets
@@ -187,6 +137,56 @@ At the high-level overview of the model quality on representative benchmarks. Fo
187
 
188
  Overall, Phi-4-reasoning, with only 14B parameters, performs well across a wide range of reasoning tasks, outperforming significantly larger open-weight models such as DeepSeek-R1 distilled 70B model and approaching the performance levels of full DeepSeek R1 model. We also test the models on multiple new reasoning benchmarks for algorithmic problem solving and planning, including 3SAT, TSP, and BA-Calendar. These new tasks are nominally out-of-domain for the models as the training process did not intentionally target these skills, but the models still show strong generalization to these tasks. Furthermore, when evaluating performance against standard general abilities benchmarks such as instruction following or non-reasoning tasks, we find that our new models improve significantly from Phi-4, despite the post-training being focused on reasoning skills in specific domains.
189
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
190
  ## Responsible AI Considerations
191
 
192
  Like other language models, Phi-4-reasoning can potentially behave in ways that are unfair, unreliable, or offensive. Some of the limiting behaviors to be aware of include:
@@ -213,6 +213,4 @@ Developers should apply responsible AI best practices and are responsible for en
213
 
214
  * **Generation of Harmful Content:** Developers should assess outputs for their context and use available safety classifiers or custom solutions appropriate for their use case.
215
 
216
- * **Misuse:** Other forms of misuse such as fraud, spam, or malware production may be possible, and developers should ensure that their applications do not violate applicable laws and regulations.
217
-
218
- * **Data Summary:** https://huggingface.co/microsoft/Phi-4-reasoning/blob/main/data_summary_card.md
 
26
 
27
  # Phi-4-reasoning Model Card
28
 
29
+ [Phi-4-reasoning Technical Report](https://aka.ms/phi-reasoning/techreport)
30
 
31
  ## Model Summary
32
 
 
53
  | **Primary Use Cases** | Our model is designed to accelerate research on language models, for use as a building block for generative AI powered features. It provides uses for general purpose AI systems and applications (primarily in English) which require:<br><br>1. Memory/compute constrained environments.<br>2. Latency bound scenarios.<br>3. Reasoning and logic. |
54
  | **Out-of-Scope Use Cases** | This model is designed and tested for math reasoning only. Our models are not specifically designed or evaluated for all downstream purposes. Developers should consider common limitations of language models as they select use cases, and evaluate and mitigate for accuracy, safety, and fairness before using within a specific downstream use case, particularly for high-risk scenarios. Developers should be aware of and adhere to applicable laws or regulations (including privacy, trade compliance laws, etc.) that are relevant to their use case, including the model’s focus on English. Review the Responsible AI Considerations section below for further guidance when choosing a use case. Nothing contained in this Model Card should be interpreted as or deemed a restriction or modification to the license the model is released under. |
55
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
  ## Data Overview
57
 
58
  ### Training Datasets
 
137
 
138
  Overall, Phi-4-reasoning, with only 14B parameters, performs well across a wide range of reasoning tasks, outperforming significantly larger open-weight models such as DeepSeek-R1 distilled 70B model and approaching the performance levels of full DeepSeek R1 model. We also test the models on multiple new reasoning benchmarks for algorithmic problem solving and planning, including 3SAT, TSP, and BA-Calendar. These new tasks are nominally out-of-domain for the models as the training process did not intentionally target these skills, but the models still show strong generalization to these tasks. Furthermore, when evaluating performance against standard general abilities benchmarks such as instruction following or non-reasoning tasks, we find that our new models improve significantly from Phi-4, despite the post-training being focused on reasoning skills in specific domains.
139
 
140
+ ## Usage
141
+
142
+ ### Inference Parameters
143
+
144
+ Inference is better with `temperature=0.8`, `top_p=0.95`, and `do_sample=True`. For more complex queries, set the maximum number of tokens to 32k to allow for longer chain-of-thought (CoT).
145
+
146
+ ### Input Formats
147
+
148
+ Given the nature of the training data, always use ChatML template with the following system prompt for inference:
149
+
150
+ ```bash
151
+ <|im_start|>system<|im_sep|>
152
+ Your role as an assistant involves thoroughly exploring questions through a systematic thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution using the specified format: <think> {Thought section} <\think> {Solution section}. In the Thought section, detail your reasoning process in steps. Each step should include detailed considerations such as analysing questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The Solution section should be logical, accurate, and concise and detail necessary steps needed to reach the conclusion. Now, try to solve the following question through the above guidelines:<|im_end|>
153
+ <|im_start|>user<|im_sep|>
154
+ What is the derivative of x^2?<|im_end|>
155
+ <|im_start|>assistant<|im_sep|>
156
+ ```
157
+
158
+ ### With `transformers`
159
+
160
+ ```python
161
+ from transformers import AutoTokenizer, AutoModelForCausalLM
162
+
163
+ tokenizer = AutoTokenizer.from_pretrained("microsoft/Phi-4-reasoning")
164
+ model = AutoModelForCausalLM.from_pretrained("microsoft/Phi-4-reasoning", device_map="auto", torch_dtype="auto")
165
+
166
+ messages = [
167
+ {"role": "system", "content": "You are Phi, a language model trained by Microsoft to help users. Your role as an assistant involves thoroughly exploring questions through a systematic thinking process before providing the final precise and accurate solutions. This requires engaging in a comprehensive cycle of analysis, summarizing, exploration, reassessment, reflection, backtracing, and iteration to develop well-considered thinking process. Please structure your response into two main sections: Thought and Solution using the specified format: <think> {Thought section} </think> {Solution section}. In the Thought section, detail your reasoning process in steps. Each step should include detailed considerations such as analysing questions, summarizing relevant findings, brainstorming new ideas, verifying the accuracy of the current steps, refining any errors, and revisiting previous steps. In the Solution section, based on various attempts, explorations, and reflections from the Thought section, systematically present the final solution that you deem correct. The Solution section should be logical, accurate, and concise and detail necessary steps needed to reach the conclusion. Now, try to solve the following question through the above guidelines:"},
168
+ {"role": "user", "content": "What is the derivative of x^2?"},
169
+ ]
170
+ inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt")
171
+
172
+ outputs = model.generate(
173
+ inputs.to(model.device),
174
+ max_new_tokens=4096,
175
+ temperature=0.8,
176
+ top_p=0.95,
177
+ do_sample=True,
178
+ )
179
+ print(tokenizer.decode(outputs[0]))
180
+ ```
181
+
182
+ ### With `vllm`
183
+
184
+ ```bash
185
+ vllm serve microsoft/Phi-4-reasoning --enable-reasoning --reasoning-parser deepseek_r1
186
+ ```
187
+
188
+ *Phi-4-reasoning is also supported out-of-the-box by Ollama, llama.cpp, and any Phi-4 compatible framework.*
189
+
190
  ## Responsible AI Considerations
191
 
192
  Like other language models, Phi-4-reasoning can potentially behave in ways that are unfair, unreliable, or offensive. Some of the limiting behaviors to be aware of include:
 
213
 
214
  * **Generation of Harmful Content:** Developers should assess outputs for their context and use available safety classifiers or custom solutions appropriate for their use case.
215
 
216
+ * **Misuse:** Other forms of misuse such as fraud, spam, or malware production may be possible, and developers should ensure that their applications do not violate applicable laws and regulations.
 
 
data_summary_card.md DELETED
@@ -1,149 +0,0 @@
1
-
2
-
3
- # Data Summary for microsoft_Phi-4-reasoning
4
-
5
-
6
-
7
-
8
-
9
- ## 1. General information
10
-
11
- **1.0.1 Version of the Summary:** 1.0
12
-
13
-
14
-
15
- **1.0.2 Last update:** 24-Nov-2025
16
-
17
-
18
-
19
- ## 1.1 Model Developer Identification
20
-
21
- **1.1.1 Model Developer name and contact details:** Microsoft Corporation at One Microsoft Way, Redmond, WA 98052. Tel: 425-882-8080
22
-
23
-
24
-
25
- ## 1.2 Model Identification
26
-
27
- **1.2.1 Versioned model name(s):** Phi-4-reasoning
28
-
29
-
30
-
31
- **1.2.2 Model release date:** 30-Apr-2025
32
-
33
-
34
-
35
- ## 1.3 Overall training data size and characteristics
36
-
37
- ### 1.3.1 Size of dataset and characteristics
38
-
39
- **1.3.1.A Text training data size:** 1 billion to 1 trillion tokens
40
-
41
-
42
-
43
-
44
-
45
- **1.3.1.B Text training data content:** Prompts sourced from publicly available websites, existing datasets, and licensed collections, augmented with synthetically generated problems; responses generated using o3-mini including chain-of-thought traces; includes STEM, coding, logical puzzles, and safety/Responsible AI alignment data
46
-
47
-
48
-
49
- **1.3.1.C Image training data size:** Not applicable. Images are not part of the training data
50
-
51
-
52
-
53
- **1.3.1.D Image training data content:** Not applicable
54
-
55
-
56
-
57
- **1.3.1.E Audio training data size:** Not applicable. Audio data is not part of the training data
58
-
59
-
60
-
61
- **1.3.1.F Audio training data content:** Not applicable
62
-
63
-
64
-
65
- **1.3.1.G Video training data size:** Not applicable. Video data is not part of the training data
66
-
67
-
68
-
69
- **1.3.1.H Video training data content:** Not applicable
70
-
71
-
72
-
73
- **1.3.1.I Other training data size:** Not applicable
74
-
75
-
76
-
77
- **1.3.1.J Other training data content:** Not applicable
78
-
79
-
80
-
81
- **1.3.2 Latest date of data acquisition/collection for model training:** 31-Mar-2025
82
-
83
-
84
-
85
- **1.3.3 Is data collection ongoing to update the model with new data collection after deployment?** No
86
-
87
-
88
-
89
- **1.3.4 Date the training dataset was first used to train the model:** 01-Jan-2025
90
-
91
-
92
-
93
- **1.3.5 Rationale or purpose of data selection:** Datasets were curated to emphasize complex multi-step reasoning and verifiable solutions across STEM, coding, and safety, selecting prompts at the boundary of base model capabilities. Synthetic problems and teacher-generated reasoning traces were used to distill structured chain-of-thought and promote concise, checkable answers, supporting robust reasoning performance and generalization to broader tasks
94
-
95
-
96
-
97
- ## 2. List of data sources
98
-
99
- ### 2.1 Publicly available datasets
100
-
101
- **2.1.1 Have you used publicly available datasets to train the model?** Yes
102
-
103
-
104
-
105
- ## 2.2 Private non-publicly available datasets obtained from third parties
106
-
107
- ### 2.2.1 Datasets commercially licensed by rights holders or their representatives
108
-
109
- **2.2.1.A Have you concluded transactional commercial licensing agreement(s) with rights holder(s) or with their representatives?** Yes
110
-
111
-
112
-
113
- ### 2.2.2 Private datasets obtained from other third-parties
114
-
115
- **2.2.2.A Have you obtained private datasets from third parties that are not licensed as described in Section 2.2.1, such as data obtained from providers of private databases, or data intermediaries?** This information cannot be provided due to unavailability of the underlying data (e.g., loss, corruption, or other access limitations)
116
-
117
-
118
-
119
- ## 2.3 Personal Information
120
-
121
- **2.3.1 Was personal data used to train the model?** Microsoft follows all relevant laws and regulations pertaining to personal information
122
-
123
-
124
-
125
- ## 2.4 Synthetic data
126
-
127
- **2.4.1 Was any synthetic AI-generated data used to train the model?** Yes
128
-
129
-
130
-
131
- ## 3. Data processing aspects
132
-
133
- ### 3.1 Respect of reservation of rights from text and data mining exception or limitation
134
-
135
- **3.1.1 Does this dataset include any data protected by copyright, trademark, or patent?** Microsoft follows all required regulations and laws for processing data protected by copyright, trademark, or patent
136
-
137
-
138
-
139
- ## 3.2 Other information
140
-
141
- **3.2.1 Does the dataset include information about consumer groups without revealing individual consumer identities?** Microsoft follows all required regulations and laws for protecting consumer identities
142
-
143
-
144
-
145
- **3.2.2 Was the dataset cleaned or modified before model training?** Yes
146
-
147
-
148
-
149
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
generation_config.json CHANGED
@@ -5,7 +5,6 @@
5
  "eos_token_id": 100265,
6
  "pad_token_id": 100349,
7
  "temperature": 0.8,
8
- "top_k": 50,
9
  "top_p": 0.95,
10
  "transformers_version": "4.51.1"
11
  }
 
5
  "eos_token_id": 100265,
6
  "pad_token_id": 100349,
7
  "temperature": 0.8,
 
8
  "top_p": 0.95,
9
  "transformers_version": "4.51.1"
10
  }
special_tokens_map.json CHANGED
@@ -19,5 +19,12 @@
19
  "normalized": false,
20
  "rstrip": true,
21
  "single_word": false
 
 
 
 
 
 
 
22
  }
23
  }
 
19
  "normalized": false,
20
  "rstrip": true,
21
  "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "�",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
  }
30
  }
tokenizer.json CHANGED
@@ -3,6 +3,15 @@
3
  "truncation": null,
4
  "padding": null,
5
  "added_tokens": [
 
 
 
 
 
 
 
 
 
6
  {
7
  "id": 100256,
8
  "content": "<|dummy_0|>",
@@ -28,7 +37,7 @@
28
  "lstrip": true,
29
  "rstrip": true,
30
  "normalized": false,
31
- "special": false
32
  },
33
  {
34
  "id": 100259,
@@ -37,7 +46,7 @@
37
  "lstrip": true,
38
  "rstrip": true,
39
  "normalized": false,
40
- "special": false
41
  },
42
  {
43
  "id": 100260,
@@ -46,7 +55,7 @@
46
  "lstrip": true,
47
  "rstrip": true,
48
  "normalized": false,
49
- "special": false
50
  },
51
  {
52
  "id": 100261,
@@ -856,7 +865,7 @@
856
  "lstrip": true,
857
  "rstrip": true,
858
  "normalized": false,
859
- "special": false
860
  },
861
  {
862
  "id": 100351,
@@ -865,7 +874,7 @@
865
  "lstrip": true,
866
  "rstrip": true,
867
  "normalized": false,
868
- "special": false
869
  }
870
  ],
871
  "normalizer": null,
 
3
  "truncation": null,
4
  "padding": null,
5
  "added_tokens": [
6
+ {
7
+ "id": 5809,
8
+ "content": "�",
9
+ "single_word": false,
10
+ "lstrip": false,
11
+ "rstrip": false,
12
+ "normalized": false,
13
+ "special": true
14
+ },
15
  {
16
  "id": 100256,
17
  "content": "<|dummy_0|>",
 
37
  "lstrip": true,
38
  "rstrip": true,
39
  "normalized": false,
40
+ "special": true
41
  },
42
  {
43
  "id": 100259,
 
46
  "lstrip": true,
47
  "rstrip": true,
48
  "normalized": false,
49
+ "special": true
50
  },
51
  {
52
  "id": 100260,
 
55
  "lstrip": true,
56
  "rstrip": true,
57
  "normalized": false,
58
+ "special": true
59
  },
60
  {
61
  "id": 100261,
 
865
  "lstrip": true,
866
  "rstrip": true,
867
  "normalized": false,
868
+ "special": true
869
  },
870
  {
871
  "id": 100351,
 
874
  "lstrip": true,
875
  "rstrip": true,
876
  "normalized": false,
877
+ "special": true
878
  }
879
  ],
880
  "normalizer": null,
tokenizer_config.json CHANGED
@@ -1,6 +1,14 @@
1
  {
2
  "add_prefix_space": false,
3
  "added_tokens_decoder": {
 
 
 
 
 
 
 
 
4
  "100256": {
5
  "content": "<|dummy_0|>",
6
  "lstrip": true,
@@ -23,7 +31,7 @@
23
  "normalized": false,
24
  "rstrip": true,
25
  "single_word": false,
26
- "special": false
27
  },
28
  "100259": {
29
  "content": "<|fim_middle|>",
@@ -31,7 +39,7 @@
31
  "normalized": false,
32
  "rstrip": true,
33
  "single_word": false,
34
- "special": false
35
  },
36
  "100260": {
37
  "content": "<|fim_suffix|>",
@@ -39,7 +47,7 @@
39
  "normalized": false,
40
  "rstrip": true,
41
  "single_word": false,
42
- "special": false
43
  },
44
  "100261": {
45
  "content": "<|dummy_1|>",
@@ -759,7 +767,7 @@
759
  "normalized": false,
760
  "rstrip": true,
761
  "single_word": false,
762
- "special": false
763
  },
764
  "100351": {
765
  "content": "</think>",
@@ -767,7 +775,7 @@
767
  "normalized": false,
768
  "rstrip": true,
769
  "single_word": false,
770
- "special": false
771
  }
772
  },
773
  "bos_token": "<|endoftext|>",
@@ -778,5 +786,6 @@
778
  "model_max_length": 32768,
779
  "pad_token": "<|dummy_85|>",
780
  "padding_side": "left",
781
- "tokenizer_class": "GPT2Tokenizer"
 
782
  }
 
1
  {
2
  "add_prefix_space": false,
3
  "added_tokens_decoder": {
4
+ "5809": {
5
+ "content": "�",
6
+ "lstrip": false,
7
+ "normalized": false,
8
+ "rstrip": false,
9
+ "single_word": false,
10
+ "special": true
11
+ },
12
  "100256": {
13
  "content": "<|dummy_0|>",
14
  "lstrip": true,
 
31
  "normalized": false,
32
  "rstrip": true,
33
  "single_word": false,
34
+ "special": true
35
  },
36
  "100259": {
37
  "content": "<|fim_middle|>",
 
39
  "normalized": false,
40
  "rstrip": true,
41
  "single_word": false,
42
+ "special": true
43
  },
44
  "100260": {
45
  "content": "<|fim_suffix|>",
 
47
  "normalized": false,
48
  "rstrip": true,
49
  "single_word": false,
50
+ "special": true
51
  },
52
  "100261": {
53
  "content": "<|dummy_1|>",
 
767
  "normalized": false,
768
  "rstrip": true,
769
  "single_word": false,
770
+ "special": true
771
  },
772
  "100351": {
773
  "content": "</think>",
 
775
  "normalized": false,
776
  "rstrip": true,
777
  "single_word": false,
778
+ "special": true
779
  }
780
  },
781
  "bos_token": "<|endoftext|>",
 
786
  "model_max_length": 32768,
787
  "pad_token": "<|dummy_85|>",
788
  "padding_side": "left",
789
+ "tokenizer_class": "GPT2Tokenizer",
790
+ "unk_token": "�"
791
  }