lukealonso commited on
Commit
1e0ec7a
·
verified ·
1 Parent(s): 580f841

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -1
README.md CHANGED
@@ -24,7 +24,27 @@ Samples were drawn from a diverse mix of publicly available datasets spanning co
24
 
25
  ### Quality
26
 
27
- Initial testing has been very positive, but you should evaluate against your specific use case.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
28
 
29
  ### How to Run
30
 
 
24
 
25
  ### Quality
26
 
27
+ MMLU-Pro results (thanks to Lavd for providing these):
28
+
29
+ | Category | Correct | Total | Accuracy |
30
+ |---|---:|---:|---:|
31
+ | Math | 1279 | 1351 | 94.7% |
32
+ | Biology | 675 | 717 | 94.1% |
33
+ | Physics | 1188 | 1299 | 91.5% |
34
+ | Chemistry | 1035 | 1132 | 91.4% |
35
+ | Business | 715 | 789 | 90.6% |
36
+ | Computer Science | 366 | 410 | 89.3% |
37
+ | Economics | 748 | 844 | 88.6% |
38
+ | Psychology | 674 | 798 | 84.5% |
39
+ | Health | 686 | 818 | 83.9% |
40
+ | Other | 767 | 924 | 83.0% |
41
+ | Engineering | 790 | 969 | 81.5% |
42
+ | Philosophy | 395 | 499 | 79.2% |
43
+ | History | 279 | 381 | 73.2% |
44
+ | Law | 778 | 1101 | 70.7% |
45
+ | **Overall** | **10375** | **12032** | **86.2%** |
46
+
47
+ You should always evaluate against your specific use case.
48
 
49
  ### How to Run
50