add quantization benchmarking
Browse files
README.md
CHANGED
|
@@ -20,6 +20,25 @@ vLLM compatible model that will run in:
|
|
| 20 |
[One click Runpod template](https://runpod.io/console/deploy?template=rzgcdh9rqe&ref=jmfkcdio) (affiliate link).
|
| 21 |
Other templates available from [Trelis' one-click-llms repo](https://github.com/TrelisResearch/one-click-llms).
|
| 22 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 23 |
---
|
| 24 |
# Phi-4
|
| 25 |
|
|
|
|
| 20 |
[One click Runpod template](https://runpod.io/console/deploy?template=rzgcdh9rqe&ref=jmfkcdio) (affiliate link).
|
| 21 |
Other templates available from [Trelis' one-click-llms repo](https://github.com/TrelisResearch/one-click-llms).
|
| 22 |
|
| 23 |
+
## Quantization Benchmarking
|
| 24 |
+
Run using llm_eval on 100 rows of gsm8k.
|
| 25 |
+
|
| 26 |
+
Base model - 16 bit:
|
| 27 |
+
```
|
| 28 |
+
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
|
| 29 |
+
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|
| 30 |
+
|gsm8k| 3|flexible-extract| 5|exact_match|↑ | 0.93|± |0.0256|
|
| 31 |
+
| | |strict-match | 5|exact_match|↑ | 0.93|± |0.0256|
|
| 32 |
+
```
|
| 33 |
+
|
| 34 |
+
fp8 model:
|
| 35 |
+
```
|
| 36 |
+
|Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr|
|
| 37 |
+
|-----|------:|----------------|-----:|-----------|---|----:|---|-----:|
|
| 38 |
+
|gsm8k| 3|flexible-extract| 5|exact_match|↑ | 0.93|± |0.0256|
|
| 39 |
+
| | |strict-match | 5|exact_match|↑ | 0.93|± |0.0256|
|
| 40 |
+
```
|
| 41 |
+
|
| 42 |
---
|
| 43 |
# Phi-4
|
| 44 |
|