File size: 1,374 Bytes

---
license: apache-2.0
base_model:
- microsoft/Phi-4-reasoning
pipeline_tag: text-generation
library_name: transformers
---


## Phi-4 Reasoning Quantized

---

### **🚀 Model Description**

This is an **int8 quantized version** of **Phi-4 Reasoning**, optimized using **torchao** for reduced memory footprint and accelerated inference. The quantization applies **int8 weights with dynamic int8 activations**, maintaining high task performance while enabling efficient deployment on consumer and edge hardware.

---

### **Quantization Details**

* **Method:** torchao quantization
* **Weight Precision:** int8
* **Activation Precision:** int8 dynamic
* **Technique:** Symmetric mapping
* **Impact:** Significant reduction in model size with minimal loss in reasoning, coding, and general instruction-following capabilities.

---

### **🎯 Intended Use**

* Fast inference in **production environments with limited VRAM**
* Research on **int8 quantization deployment performance**
* Tasks: general reasoning, chain-of-thought, code generation, and long-context tasks.

---

### **⚠️ Limitations**

* Slight degradation in performance compared to full-precision (bfloat16) models
* English-centric training data; may underperform in other languages or nuanced tasks
* Further finetuning or quantization-aware calibration can enhance task-specific performance.

---