Locutusque
/

Hyperion-3.0-Mistral-7B-alpha

@@ -81,6 +81,81 @@ Zero-shot AGIEval
 | - agieval_sat_math              |      1|none  |None  |acc     |0.3091|±  |0.0312|
 |                                 |       |none  |None  |acc_norm|0.2364|±  |0.0287|
 ## How to Use
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer

 | - agieval_sat_math              |      1|none  |None  |acc     |0.3091|±  |0.0312|
 |                                 |       |none  |None  |acc_norm|0.2364|±  |0.0287|
+5 shot CoT MMLU
+|                            Tasks                            |Version|  Filter  |n-shot|  Metric   |Value |   |Stderr|
+|-------------------------------------------------------------|-------|----------|-----:|-----------|-----:|---|-----:|
+|mmlu_flan_cot_fewshot                                        |N/A    |get-answer|     0|exact_match|0.5924|±  |0.0118|
+| - mmlu_flan_cot_fewshot_humanities                          |N/A    |get-answer|     0|exact_match|0.5077|±  |0.0206|
+|  - mmlu_flan_cot_fewshot_formal_logic                       |      0|get-answer|     0|exact_match|0.2143|±  |0.1138|
+|  - mmlu_flan_cot_fewshot_high_school_european_history       |      0|get-answer|     0|exact_match|0.6111|±  |0.1182|
+|  - mmlu_flan_cot_fewshot_high_school_us_history             |      0|get-answer|     0|exact_match|0.7727|±  |0.0914|
+|  - mmlu_flan_cot_fewshot_high_school_world_history          |      0|get-answer|     0|exact_match|0.6154|±  |0.0973|
+|  - mmlu_flan_cot_fewshot_international_law                  |      0|get-answer|     0|exact_match|0.9231|±  |0.0769|
+|  - mmlu_flan_cot_fewshot_jurisprudence                      |      0|get-answer|     0|exact_match|0.3636|±  |0.1521|
+|  - mmlu_flan_cot_fewshot_logical_fallacies                  |      0|get-answer|     0|exact_match|0.7222|±  |0.1086|
+|  - mmlu_flan_cot_fewshot_moral_disputes                     |      0|get-answer|     0|exact_match|0.5526|±  |0.0817|
+|  - mmlu_flan_cot_fewshot_moral_scenarios                    |      0|get-answer|     0|exact_match|0.3900|±  |0.0490|
+|  - mmlu_flan_cot_fewshot_philosophy                         |      0|get-answer|     0|exact_match|0.7647|±  |0.0738|
+|  - mmlu_flan_cot_fewshot_prehistory                         |      0|get-answer|     0|exact_match|0.7143|±  |0.0775|
+|  - mmlu_flan_cot_fewshot_professional_law                   |      0|get-answer|     0|exact_match|0.3471|±  |0.0366|
+|  - mmlu_flan_cot_fewshot_world_religions                    |      0|get-answer|     0|exact_match|0.8947|±  |0.0723|
+| - mmlu_flan_cot_fewshot_other                               |N/A    |get-answer|     0|exact_match|0.6921|±  |0.0240|
+|  - mmlu_flan_cot_fewshot_business_ethics                    |      0|get-answer|     0|exact_match|0.9091|±  |0.0909|
+|  - mmlu_flan_cot_fewshot_clinical_knowledge                 |      0|get-answer|     0|exact_match|0.5517|±  |0.0940|
+|  - mmlu_flan_cot_fewshot_college_medicine                   |      0|get-answer|     0|exact_match|0.7727|±  |0.0914|
+|  - mmlu_flan_cot_fewshot_global_facts                       |      0|get-answer|     0|exact_match|0.6000|±  |0.1633|
+|  - mmlu_flan_cot_fewshot_human_aging                        |      0|get-answer|     0|exact_match|0.6522|±  |0.1015|
+|  - mmlu_flan_cot_fewshot_management                         |      0|get-answer|     0|exact_match|0.9091|±  |0.0909|
+|  - mmlu_flan_cot_fewshot_marketing                          |      0|get-answer|     0|exact_match|0.8400|±  |0.0748|
+|  - mmlu_flan_cot_fewshot_medical_genetics                   |      0|get-answer|     0|exact_match|1.0000|±  |0.0000|
+|  - mmlu_flan_cot_fewshot_miscellaneous                      |      0|get-answer|     0|exact_match|0.7791|±  |0.0450|
+|  - mmlu_flan_cot_fewshot_nutrition                          |      0|get-answer|     0|exact_match|0.6667|±  |0.0833|
+|  - mmlu_flan_cot_fewshot_professional_accounting            |      0|get-answer|     0|exact_match|0.4194|±  |0.0901|
+|  - mmlu_flan_cot_fewshot_professional_medicine              |      0|get-answer|     0|exact_match|0.6774|±  |0.0853|
+|  - mmlu_flan_cot_fewshot_virology                           |      0|get-answer|     0|exact_match|0.3889|±  |0.1182|
+| - mmlu_flan_cot_fewshot_social_sciences                     |N/A    |get-answer|     0|exact_match|0.6973|±  |0.0239|
+|  - mmlu_flan_cot_fewshot_econometrics                       |      0|get-answer|     0|exact_match|0.3333|±  |0.1421|
+|  - mmlu_flan_cot_fewshot_high_school_geography              |      0|get-answer|     0|exact_match|0.9091|±  |0.0627|
+|  - mmlu_flan_cot_fewshot_high_school_government_and_politics|      0|get-answer|     0|exact_match|0.8095|±  |0.0878|
+|  - mmlu_flan_cot_fewshot_high_school_macroeconomics         |      0|get-answer|     0|exact_match|0.6279|±  |0.0746|
+|  - mmlu_flan_cot_fewshot_high_school_microeconomics         |      0|get-answer|     0|exact_match|0.6154|±  |0.0973|
+|  - mmlu_flan_cot_fewshot_high_school_psychology             |      0|get-answer|     0|exact_match|0.9167|±  |0.0360|
+|  - mmlu_flan_cot_fewshot_human_sexuality                    |      0|get-answer|     0|exact_match|0.5000|±  |0.1508|
+|  - mmlu_flan_cot_fewshot_professional_psychology            |      0|get-answer|     0|exact_match|0.6667|±  |0.0572|
+|  - mmlu_flan_cot_fewshot_public_relations                   |      0|get-answer|     0|exact_match|0.5833|±  |0.1486|
+|  - mmlu_flan_cot_fewshot_security_studies                   |      0|get-answer|     0|exact_match|0.4444|±  |0.0975|
+|  - mmlu_flan_cot_fewshot_sociology                          |      0|get-answer|     0|exact_match|0.7727|±  |0.0914|
+|  - mmlu_flan_cot_fewshot_us_foreign_policy                  |      0|get-answer|     0|exact_match|0.7273|±  |0.1408|
+| - mmlu_flan_cot_fewshot_stem                                |N/A    |get-answer|     0|exact_match|0.5164|±  |0.0265|
+|  - mmlu_flan_cot_fewshot_abstract_algebra                   |      0|get-answer|     0|exact_match|0.4545|±  |0.1575|
+|  - mmlu_flan_cot_fewshot_anatomy                            |      0|get-answer|     0|exact_match|0.3571|±  |0.1329|
+|  - mmlu_flan_cot_fewshot_astronomy                          |      0|get-answer|     0|exact_match|0.5000|±  |0.1291|
+|  - mmlu_flan_cot_fewshot_college_biology                    |      0|get-answer|     0|exact_match|0.5625|±  |0.1281|
+|  - mmlu_flan_cot_fewshot_college_chemistry                  |      0|get-answer|     0|exact_match|0.3750|±  |0.1830|
+|  - mmlu_flan_cot_fewshot_college_computer_science           |      0|get-answer|     0|exact_match|0.2727|±  |0.1408|
+|  - mmlu_flan_cot_fewshot_college_mathematics                |      0|get-answer|     0|exact_match|0.2727|±  |0.1408|
+|  - mmlu_flan_cot_fewshot_college_physics                    |      0|get-answer|     0|exact_match|0.4545|±  |0.1575|
+|  - mmlu_flan_cot_fewshot_computer_security                  |      0|get-answer|     0|exact_match|0.7273|±  |0.1408|
+|  - mmlu_flan_cot_fewshot_conceptual_physics                 |      0|get-answer|     0|exact_match|0.6154|±  |0.0973|
+|  - mmlu_flan_cot_fewshot_electrical_engineering             |      0|get-answer|     0|exact_match|0.6875|±  |0.1197|
+|  - mmlu_flan_cot_fewshot_elementary_mathematics             |      0|get-answer|     0|exact_match|0.7317|±  |0.0701|
+|  - mmlu_flan_cot_fewshot_high_school_biology                |      0|get-answer|     0|exact_match|0.7188|±  |0.0808|
+|  - mmlu_flan_cot_fewshot_high_school_chemistry              |      0|get-answer|     0|exact_match|0.3636|±  |0.1050|
+|  - mmlu_flan_cot_fewshot_high_school_computer_science       |      0|get-answer|     0|exact_match|0.6667|±  |0.1667|
+|  - mmlu_flan_cot_fewshot_high_school_mathematics            |      0|get-answer|     0|exact_match|0.4138|±  |0.0931|
+|  - mmlu_flan_cot_fewshot_high_school_physics                |      0|get-answer|     0|exact_match|0.2353|±  |0.1060|
+|  - mmlu_flan_cot_fewshot_high_school_statistics             |      0|get-answer|     0|exact_match|0.4348|±  |0.1057|
+|  - mmlu_flan_cot_fewshot_machine_learning                   |      0|get-answer|     0|exact_match|0.3636|±  |0.1521|
+|                 Groups                 |Version|  Filter  |n-shot|  Metric   |Value |   |Stderr|
+|----------------------------------------|-------|----------|-----:|-----------|-----:|---|-----:|
+|mmlu_flan_cot_fewshot                   |N/A    |get-answer|     0|exact_match|0.5924|±  |0.0118|
+| - mmlu_flan_cot_fewshot_humanities     |N/A    |get-answer|     0|exact_match|0.5077|±  |0.0206|
+| - mmlu_flan_cot_fewshot_other          |N/A    |get-answer|     0|exact_match|0.6921|±  |0.0240|
+| - mmlu_flan_cot_fewshot_social_sciences|N/A    |get-answer|     0|exact_match|0.6973|±  |0.0239|
+| - mmlu_flan_cot_fewshot_stem           |N/A    |get-answer|     0|exact_match|0.5164|±  |0.0265|
 ## How to Use
 ```python
 from transformers import AutoModelForCausalLM, AutoTokenizer