nenad1002 commited on
Commit
dcc76e2
·
verified ·
1 Parent(s): f627d45

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -38,7 +38,7 @@ This is an update over the instruction-tuned Phi-3 Mini ONNX model release. We b
38
 
39
  ## What’s New (2026-02)
40
 
41
- This update introduces an improved **INT4 GPU ONNX model** that incorporates **quantization-aware fine-tuning (QAT)** on top of the existing quantization pipeline. The updated model improves accuracy across a broad range of reasoning, knowledge, and commonsense benchmarks while preserving the performance characteristics of ONNX Runtime on GPU. In addition, generation stability is significantly improved by reducing premature end-of-sequence (EOS) termination.
42
 
43
  ### Benchmark Accuracy Improvements (INT4 GPU)
44
 
@@ -49,7 +49,7 @@ This update introduces an improved **INT4 GPU ONNX model** that incorporates **q
49
  | Commonsense | PIQA, Winogrande | **+0.5 to +1.0 pts** |
50
  | Broad Coverage | MMLU (overall) | −0.5 pts |
51
 
52
- Overall, the QAT-tuned INT4 GPU model improves performance on the majority of downstream reasoning and QA benchmarks, with a small regression on broad-coverage evaluation.
53
 
54
  ### Generation Stability (EOS Behavior)
55
 
@@ -194,7 +194,7 @@ Activation Aware Quantization (AWQ) works by identifying the top 1% most salient
194
  parinitarahi
195
 
196
  ## Contributors
197
- Sunghoon Choi, Yufeng Li, Kunal Vaishnavi, Akshay Sonawane, Rui Ren, Parinita Rahi
198
 
199
  ## License
200
  The model is licensed under the MIT license.
 
38
 
39
  ## What’s New (2026-02)
40
 
41
+ This update introduces an improved **INT4 GPU ONNX model** that incorporates **quantization-aware fine-tuning (QAT)** on top of the existing quantization pipeline.
42
 
43
  ### Benchmark Accuracy Improvements (INT4 GPU)
44
 
 
49
  | Commonsense | PIQA, Winogrande | **+0.5 to +1.0 pts** |
50
  | Broad Coverage | MMLU (overall) | −0.5 pts |
51
 
52
+ The table above provides a high-level summary of observed accuracy deltas across benchmark categories compared to the old INT4 GPU model. The QAT-tuned INT4 GPU model improves performance on the majority of downstream reasoning and QA benchmarks, with a small regression on broad-coverage evaluation.
53
 
54
  ### Generation Stability (EOS Behavior)
55
 
 
194
  parinitarahi
195
 
196
  ## Contributors
197
+ Sunghoon Choi, Yufeng Li, Kunal Vaishnavi, Akshay Sonawane, Rui Ren, Parinita Rahi, Nenad Banfic
198
 
199
  ## License
200
  The model is licensed under the MIT license.