Pankaj Mathur
commited on
Commit
·
8aded8e
1
Parent(s):
0c3d4df
Update README.md
Browse files
README.md
CHANGED
|
@@ -26,14 +26,10 @@ Here are the zero shot metrics results.
|
|
| 26 |
|:------:|:-------------:|:---------:|:--------:|:-------:|:--------:|
|
| 27 |
|**Task**|**num_fewshot**|**Version**|**Metric**|**Value**|**Stderr**|
|
| 28 |
|*arc_easy*|0|0|acc|0.7386|0.0090|
|
| 29 |
-
|*arc_easy*|0|0|acc_norm|0.7066|0.0093|
|
| 30 |
-
|*hellaswag*|0|0|acc|0.5591|0.0050|
|
| 31 |
|*hellaswag*|0|0|acc_norm|0.7394|0.0044|
|
| 32 |
-
|*truthfulqa_mc*|0|1|mc1|0.2938|0.0159|
|
| 33 |
|*truthfulqa_mc*|0|1|mc2|0.4399|0.0153|
|
| 34 |
-
|*mmlu
|
| 35 |
-
|*
|
| 36 |
-
|*Total Zero Shot Average*|0|-|-|0.5373|0.011|
|
| 37 |
|
| 38 |
|
| 39 |
Here are the results on metrics used by [HuggingFaceH4 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
|
@@ -43,8 +39,12 @@ please note num_fewshots varies for each below task as used by HuggingFaceH4 Ope
|
|
| 43 |
|||||||
|
| 44 |
|:------:|:-------------:|:---------:|:--------:|:-------:|:--------:|
|
| 45 |
|**Task**|**num_fewshot**|**Version**|**Metric**|**Value**|**Stderr**|
|
| 46 |
-
|*arc_challenge*|25|0|acc|0.4846|0.0146|
|
| 47 |
|*arc_challenge*|25|0|acc_norm|0.5077|0.0146|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 48 |
|
| 49 |
|
| 50 |
|
|
|
|
| 26 |
|:------:|:-------------:|:---------:|:--------:|:-------:|:--------:|
|
| 27 |
|**Task**|**num_fewshot**|**Version**|**Metric**|**Value**|**Stderr**|
|
| 28 |
|*arc_easy*|0|0|acc|0.7386|0.0090|
|
|
|
|
|
|
|
| 29 |
|*hellaswag*|0|0|acc_norm|0.7394|0.0044|
|
|
|
|
| 30 |
|*truthfulqa_mc*|0|1|mc2|0.4399|0.0153|
|
| 31 |
+
|*mmlu*|0|1|acc_norm|0.4108|0.0153|
|
| 32 |
+
|*Total Zero Shot Average*|0|-|-|0.5821|0.011|
|
|
|
|
| 33 |
|
| 34 |
|
| 35 |
Here are the results on metrics used by [HuggingFaceH4 Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
|
|
|
|
| 39 |
|||||||
|
| 40 |
|:------:|:-------------:|:---------:|:--------:|:-------:|:--------:|
|
| 41 |
|**Task**|**num_fewshot**|**Version**|**Metric**|**Value**|**Stderr**|
|
|
|
|
| 42 |
|*arc_challenge*|25|0|acc_norm|0.5077|0.0146|
|
| 43 |
+
|*hellaswag*|10|0|acc_norm|0.7617|0.0043|
|
| 44 |
+
|*mmlu*|5|0|acc_norm|-|-|
|
| 45 |
+
|*truthfulqa_mc*|0|1|mc2|0.4399|0.0153|
|
| 46 |
+
|*Total Average*|0|-|-|0.5697|0.0114|
|
| 47 |
+
|
| 48 |
|
| 49 |
|
| 50 |
|