microsoft
/

Phi-3.5-mini-instruct-onnx

Text Generation

Model card Files Files and versions

nenad1002 commited on Feb 6

Commit

dcc76e2

·

verified ·

1 Parent(s): f627d45

Update README.md

Files changed (1) hide show

README.md +3 -3

README.md CHANGED Viewed

@@ -38,7 +38,7 @@ This is an update over the instruction-tuned Phi-3 Mini ONNX model release. We b
 ## What’s New (2026-02)
-This update introduces an improved **INT4 GPU ONNX model** that incorporates **quantization-aware fine-tuning (QAT)** on top of the existing quantization pipeline. The updated model improves accuracy across a broad range of reasoning, knowledge, and commonsense benchmarks while preserving the performance characteristics of ONNX Runtime on GPU. In addition, generation stability is significantly improved by reducing premature end-of-sequence (EOS) termination.
 ### Benchmark Accuracy Improvements (INT4 GPU)
@@ -49,7 +49,7 @@ This update introduces an improved **INT4 GPU ONNX model** that incorporates **q
 | Commonsense    | PIQA, Winogrande                    | **+0.5 to +1.0 pts** |
 | Broad Coverage | MMLU (overall)                      | −0.5 pts |
-Overall, the QAT-tuned INT4 GPU model improves performance on the majority of downstream reasoning and QA benchmarks, with a small regression on broad-coverage evaluation.
 ### Generation Stability (EOS Behavior)
@@ -194,7 +194,7 @@ Activation Aware Quantization (AWQ) works by identifying the top 1% most salient
 parinitarahi
 ## Contributors
-Sunghoon Choi, Yufeng Li, Kunal Vaishnavi, Akshay Sonawane, Rui Ren, Parinita Rahi
 ## License
 The model is licensed under the MIT license.

 ## What’s New (2026-02)
+This update introduces an improved **INT4 GPU ONNX model** that incorporates **quantization-aware fine-tuning (QAT)** on top of the existing quantization pipeline.
 ### Benchmark Accuracy Improvements (INT4 GPU)
 | Commonsense    | PIQA, Winogrande                    | **+0.5 to +1.0 pts** |
 | Broad Coverage | MMLU (overall)                      | −0.5 pts |
+The table above provides a high-level summary of observed accuracy deltas across benchmark categories compared to the old INT4 GPU model. The QAT-tuned INT4 GPU model improves performance on the majority of downstream reasoning and QA benchmarks, with a small regression on broad-coverage evaluation.
 ### Generation Stability (EOS Behavior)
 parinitarahi
 ## Contributors
+Sunghoon Choi, Yufeng Li, Kunal Vaishnavi, Akshay Sonawane, Rui Ren, Parinita Rahi, Nenad Banfic
 ## License
 The model is licensed under the MIT license.