File size: 1,374 Bytes
03e8bac
 
 
 
 
 
142c814
 
 
30b47e8
142c814
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
---
license: apache-2.0
base_model:
- microsoft/Phi-4-reasoning
pipeline_tag: text-generation
library_name: transformers
---


## Phi-4 Reasoning Quantized

---

### **🚀 Model Description**

This is an **int8 quantized version** of **Phi-4 Reasoning**, optimized using **torchao** for reduced memory footprint and accelerated inference. The quantization applies **int8 weights with dynamic int8 activations**, maintaining high task performance while enabling efficient deployment on consumer and edge hardware.

---

### **Quantization Details**

* **Method:** torchao quantization
* **Weight Precision:** int8
* **Activation Precision:** int8 dynamic
* **Technique:** Symmetric mapping
* **Impact:** Significant reduction in model size with minimal loss in reasoning, coding, and general instruction-following capabilities.

---

### **🎯 Intended Use**

* Fast inference in **production environments with limited VRAM**
* Research on **int8 quantization deployment performance**
* Tasks: general reasoning, chain-of-thought, code generation, and long-context tasks.

---

### **⚠️ Limitations**

* Slight degradation in performance compared to full-precision (bfloat16) models
* English-centric training data; may underperform in other languages or nuanced tasks
* Further finetuning or quantization-aware calibration can enhance task-specific performance.

---