AINovice2005's picture
Update README.md
30b47e8 verified
---
license: apache-2.0
base_model:
- microsoft/Phi-4-reasoning
pipeline_tag: text-generation
library_name: transformers
---
## Phi-4 Reasoning Quantized
---
### **🚀 Model Description**
This is an **int8 quantized version** of **Phi-4 Reasoning**, optimized using **torchao** for reduced memory footprint and accelerated inference. The quantization applies **int8 weights with dynamic int8 activations**, maintaining high task performance while enabling efficient deployment on consumer and edge hardware.
---
### **Quantization Details**
* **Method:** torchao quantization
* **Weight Precision:** int8
* **Activation Precision:** int8 dynamic
* **Technique:** Symmetric mapping
* **Impact:** Significant reduction in model size with minimal loss in reasoning, coding, and general instruction-following capabilities.
---
### **🎯 Intended Use**
* Fast inference in **production environments with limited VRAM**
* Research on **int8 quantization deployment performance**
* Tasks: general reasoning, chain-of-thought, code generation, and long-context tasks.
---
### **⚠️ Limitations**
* Slight degradation in performance compared to full-precision (bfloat16) models
* English-centric training data; may underperform in other languages or nuanced tasks
* Further finetuning or quantization-aware calibration can enhance task-specific performance.
---