Update README.md
Browse files
README.md
CHANGED
|
@@ -24,7 +24,27 @@ Samples were drawn from a diverse mix of publicly available datasets spanning co
|
|
| 24 |
|
| 25 |
### Quality
|
| 26 |
|
| 27 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 28 |
|
| 29 |
### How to Run
|
| 30 |
|
|
|
|
| 24 |
|
| 25 |
### Quality
|
| 26 |
|
| 27 |
+
MMLU-Pro results (thanks to Lavd for providing these):
|
| 28 |
+
|
| 29 |
+
| Category | Correct | Total | Accuracy |
|
| 30 |
+
|---|---:|---:|---:|
|
| 31 |
+
| Math | 1279 | 1351 | 94.7% |
|
| 32 |
+
| Biology | 675 | 717 | 94.1% |
|
| 33 |
+
| Physics | 1188 | 1299 | 91.5% |
|
| 34 |
+
| Chemistry | 1035 | 1132 | 91.4% |
|
| 35 |
+
| Business | 715 | 789 | 90.6% |
|
| 36 |
+
| Computer Science | 366 | 410 | 89.3% |
|
| 37 |
+
| Economics | 748 | 844 | 88.6% |
|
| 38 |
+
| Psychology | 674 | 798 | 84.5% |
|
| 39 |
+
| Health | 686 | 818 | 83.9% |
|
| 40 |
+
| Other | 767 | 924 | 83.0% |
|
| 41 |
+
| Engineering | 790 | 969 | 81.5% |
|
| 42 |
+
| Philosophy | 395 | 499 | 79.2% |
|
| 43 |
+
| History | 279 | 381 | 73.2% |
|
| 44 |
+
| Law | 778 | 1101 | 70.7% |
|
| 45 |
+
| **Overall** | **10375** | **12032** | **86.2%** |
|
| 46 |
+
|
| 47 |
+
You should always evaluate against your specific use case.
|
| 48 |
|
| 49 |
### How to Run
|
| 50 |
|