Update README.md
Browse files
README.md
CHANGED
|
@@ -115,15 +115,27 @@ lm_eval --model hf --model_args pretrained=pytorch/Phi-4-mini-instruct-int4wo-hq
|
|
| 115 |
`TODO: more complete eval results`
|
| 116 |
|
| 117 |
|
| 118 |
-
| Benchmark |
|
| 119 |
-
|
| 120 |
-
| | Phi-4 mini-Ins | phi4-mini-int4wo
|
| 121 |
-
| **Popular aggregated benchmark** |
|
| 122 |
-
|
|
| 123 |
-
|
|
| 124 |
-
| **
|
| 125 |
-
|
|
| 126 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 127 |
|
| 128 |
# Model Performance
|
| 129 |
|
|
|
|
| 115 |
`TODO: more complete eval results`
|
| 116 |
|
| 117 |
|
| 118 |
+
| Benchmark | | |
|
| 119 |
+
|----------------------------------|----------------|---------------------|
|
| 120 |
+
| | Phi-4 mini-Ins | phi4-mini-int4wo |
|
| 121 |
+
| **Popular aggregated benchmark** | | |
|
| 122 |
+
| mmlu (0-shot) | | 63.56 |
|
| 123 |
+
| mmlu_pro (5-shot) | | 36.74 |
|
| 124 |
+
| **Reasoning** | | |
|
| 125 |
+
| arc_challenge (0-shot) | | 54.86 |
|
| 126 |
+
| gpqa_main_zeroshot | | 30.58 |
|
| 127 |
+
| HellaSwag | 54.57 | 53.54 |
|
| 128 |
+
| openbookqa | | 34.40 |
|
| 129 |
+
| piqa (0-shot) | | 76.33 |
|
| 130 |
+
| social_iqa | | 47.90 |
|
| 131 |
+
| truthfulqa_mc2 (0-shot) | | 46.44 |
|
| 132 |
+
| winogrande (0-shot) | | 71.51 |
|
| 133 |
+
| **Multilingual** | | |
|
| 134 |
+
| mgsm_en_cot_en | | 59.6 |
|
| 135 |
+
| **Math** | | |
|
| 136 |
+
| gsm8k (5-shot) | | 74.37 |
|
| 137 |
+
| mathqa (0-shot) | | 42.75 |
|
| 138 |
+
| **Overall** | **TODO** | **TODO** |
|
| 139 |
|
| 140 |
# Model Performance
|
| 141 |
|