File size: 1,374 Bytes
03e8bac 142c814 30b47e8 142c814 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 | ---
license: apache-2.0
base_model:
- microsoft/Phi-4-reasoning
pipeline_tag: text-generation
library_name: transformers
---
## Phi-4 Reasoning Quantized
---
### **🚀 Model Description**
This is an **int8 quantized version** of **Phi-4 Reasoning**, optimized using **torchao** for reduced memory footprint and accelerated inference. The quantization applies **int8 weights with dynamic int8 activations**, maintaining high task performance while enabling efficient deployment on consumer and edge hardware.
---
### **Quantization Details**
* **Method:** torchao quantization
* **Weight Precision:** int8
* **Activation Precision:** int8 dynamic
* **Technique:** Symmetric mapping
* **Impact:** Significant reduction in model size with minimal loss in reasoning, coding, and general instruction-following capabilities.
---
### **🎯 Intended Use**
* Fast inference in **production environments with limited VRAM**
* Research on **int8 quantization deployment performance**
* Tasks: general reasoning, chain-of-thought, code generation, and long-context tasks.
---
### **⚠️ Limitations**
* Slight degradation in performance compared to full-precision (bfloat16) models
* English-centric training data; may underperform in other languages or nuanced tasks
* Further finetuning or quantization-aware calibration can enhance task-specific performance.
---
|