|
|
--- |
|
|
license: apache-2.0 |
|
|
base_model: |
|
|
- microsoft/Phi-4-reasoning |
|
|
pipeline_tag: text-generation |
|
|
library_name: transformers |
|
|
--- |
|
|
|
|
|
|
|
|
## Phi-4 Reasoning Quantized |
|
|
|
|
|
--- |
|
|
|
|
|
### **🚀 Model Description** |
|
|
|
|
|
This is an **int8 quantized version** of **Phi-4 Reasoning**, optimized using **torchao** for reduced memory footprint and accelerated inference. The quantization applies **int8 weights with dynamic int8 activations**, maintaining high task performance while enabling efficient deployment on consumer and edge hardware. |
|
|
|
|
|
--- |
|
|
|
|
|
### **Quantization Details** |
|
|
|
|
|
* **Method:** torchao quantization |
|
|
* **Weight Precision:** int8 |
|
|
* **Activation Precision:** int8 dynamic |
|
|
* **Technique:** Symmetric mapping |
|
|
* **Impact:** Significant reduction in model size with minimal loss in reasoning, coding, and general instruction-following capabilities. |
|
|
|
|
|
--- |
|
|
|
|
|
### **🎯 Intended Use** |
|
|
|
|
|
* Fast inference in **production environments with limited VRAM** |
|
|
* Research on **int8 quantization deployment performance** |
|
|
* Tasks: general reasoning, chain-of-thought, code generation, and long-context tasks. |
|
|
|
|
|
--- |
|
|
|
|
|
### **⚠️ Limitations** |
|
|
|
|
|
* Slight degradation in performance compared to full-precision (bfloat16) models |
|
|
* English-centric training data; may underperform in other languages or nuanced tasks |
|
|
* Further finetuning or quantization-aware calibration can enhance task-specific performance. |
|
|
|
|
|
--- |
|
|
|
|
|
|