Trelis
/

Microsoft_Phi-4-FP8-Dynamic

compressed-tensors

Model card Files Files and versions

RonanMcGovern commited on Dec 13, 2024

Commit

e01e96e

·

verified ·

1 Parent(s): 379b321

add quantization benchmarking

Files changed (1) hide show

README.md +19 -0

README.md CHANGED Viewed

@@ -20,6 +20,25 @@ vLLM compatible model that will run in:
 [One click Runpod template](https://runpod.io/console/deploy?template=rzgcdh9rqe&ref=jmfkcdio) (affiliate link).
 Other templates available from [Trelis' one-click-llms repo](https://github.com/TrelisResearch/one-click-llms).
 ---
 # Phi-4

 [One click Runpod template](https://runpod.io/console/deploy?template=rzgcdh9rqe&ref=jmfkcdio) (affiliate link).
 Other templates available from [Trelis' one-click-llms repo](https://github.com/TrelisResearch/one-click-llms).
+## Quantization Benchmarking
+Run using llm_eval on 100 rows of gsm8k.
+Base model - 16 bit:
+```
+|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
+|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
+|gsm8k|      3|flexible-extract|     5|exact_match|↑  | 0.93|±  |0.0256|
+|     |       |strict-match    |     5|exact_match|↑  | 0.93|±  |0.0256|
+```
+fp8 model:
+```
+|Tasks|Version|     Filter     |n-shot|  Metric   |   |Value|   |Stderr|
+|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
+|gsm8k|      3|flexible-extract|     5|exact_match|↑  | 0.93|±  |0.0256|
+|     |       |strict-match    |     5|exact_match|↑  | 0.93|±  |0.0256|
+```
 ---
 # Phi-4