AINovice2005 commited on
Commit
142c814
·
verified ·
1 Parent(s): 03e8bac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +39 -1
README.md CHANGED
@@ -4,4 +4,42 @@ base_model:
4
  - microsoft/Phi-4-reasoning
5
  pipeline_tag: text-generation
6
  library_name: transformers
7
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  - microsoft/Phi-4-reasoning
5
  pipeline_tag: text-generation
6
  library_name: transformers
7
+ ---
8
+
9
+
10
+ ## Phi-4 Reasoning •Int8 Quantized
11
+
12
+ ---
13
+
14
+ ### **🚀 Model Description**
15
+
16
+ This is an **int8 quantized version** of **Phi-4 Reasoning**, optimized using **torchao** for reduced memory footprint and accelerated inference. The quantization applies **int8 weights with dynamic int8 activations**, maintaining high task performance while enabling efficient deployment on consumer and edge hardware.
17
+
18
+ ---
19
+
20
+ ### **Quantization Details**
21
+
22
+ * **Method:** torchao quantization
23
+ * **Weight Precision:** int8
24
+ * **Activation Precision:** int8 dynamic
25
+ * **Technique:** Symmetric mapping
26
+ * **Impact:** Significant reduction in model size with minimal loss in reasoning, coding, and general instruction-following capabilities.
27
+
28
+ ---
29
+
30
+ ### **🎯 Intended Use**
31
+
32
+ * Fast inference in **production environments with limited VRAM**
33
+ * Research on **int8 quantization deployment performance**
34
+ * Tasks: general reasoning, chain-of-thought, code generation, and long-context tasks.
35
+
36
+ ---
37
+
38
+ ### **⚠️ Limitations**
39
+
40
+ * Slight degradation in performance compared to full-precision (bfloat16) models
41
+ * English-centric training data; may underperform in other languages or nuanced tasks
42
+ * Further finetuning or quantization-aware calibration can enhance task-specific performance.
43
+
44
+ ---
45
+