SnifferCaptain commited on
Commit
9eeee8a
·
verified ·
1 Parent(s): 3527d6f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +54 -54
README.md CHANGED
@@ -52,65 +52,65 @@ YModel2 is the most powerful large language model (LLM) trained by SnifferCaptai
52
  模型跑分结果如下,使用lm_eval框架:
53
  | Groups |Version|Filter|n-shot| Metric | |Value | |Stderr|
54
  |-----------|------:|------|------|--------|---|-----:|---|-----:|
55
- |ceval-valid| 2|none | |acc |↑ |0.2452|± |0.0117|
56
  <details style="color:rgb(128,128,128)">
57
  <summary>ceval bench result</summary>
58
 
59
  | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
60
  |----------------------------------------------------|------:|------|-----:|--------|---|-----:|---|-----:|
61
- |ceval-valid | 2|none | |acc |↑ |0.2452|± |0.0117|
62
- |ceval-valid_accountant | 2|none | 0|acc |↑ |0.2449|± |0.0621|
63
- |ceval-valid_advanced_mathematics | 2|none | 0|acc |↑ |0.2632|± |0.1038|
64
- |ceval-valid_art_studies | 2|none | 0|acc |↑ |0.1212|± |0.0577|
65
- |ceval-valid_basic_medicine | 2|none | 0|acc |↑ |0.0000|± |0.0000|
66
- |ceval-valid_business_administration | 2|none | 0|acc |↑ |0.3636|± |0.0850|
67
- |ceval-valid_chinese_language_and_literature | 2|none | 0|acc |↑ |0.2609|± |0.0936|
68
- |ceval-valid_civil_servant | 2|none | 0|acc |↑ |0.2766|± |0.0660|
69
- |ceval-valid_clinical_medicine | 2|none | 0|acc |↑ |0.2273|± |0.0914|
70
- |ceval-valid_college_chemistry | 2|none | 0|acc |↑ |0.1250|± |0.0690|
71
- |ceval-valid_college_economics | 2|none | 0|acc |↑ |0.3818|± |0.0661|
72
- |ceval-valid_college_physics | 2|none | 0|acc |↑ |0.2632|± |0.1038|
73
- |ceval-valid_college_programming | 2|none | 0|acc |↑ |0.2973|± |0.0762|
74
- |ceval-valid_computer_architecture | 2|none | 0|acc |↑ |0.2381|± |0.0952|
75
- |ceval-valid_computer_network | 2|none | 0|acc |↑ |0.0526|± |0.0526|
76
- |ceval-valid_discrete_mathematics | 2|none | 0|acc |↑ |0.3125|± |0.1197|
77
- |ceval-valid_education_science | 2|none | 0|acc |↑ |0.4828|± |0.0944|
78
- |ceval-valid_electrical_engineer | 2|none | 0|acc |↑ |0.2703|± |0.0740|
79
- |ceval-valid_environmental_impact_assessment_engineer| 2|none | 0|acc |↑ |0.1935|± |0.0721|
80
- |ceval-valid_fire_engineer | 2|none | 0|acc |↑ |0.3871|± |0.0889|
81
- |ceval-valid_high_school_biology | 2|none | 0|acc |↑ |0.3684|± |0.1137|
82
- |ceval-valid_high_school_chemistry | 2|none | 0|acc |↑ |0.1579|± |0.0859|
83
- |ceval-valid_high_school_chinese | 2|none | 0|acc |↑ |0.2632|± |0.1038|
84
- |ceval-valid_high_school_geography | 2|none | 0|acc |↑ |0.2105|± |0.0961|
85
- |ceval-valid_high_school_history | 2|none | 0|acc |↑ |0.3000|± |0.1051|
86
- |ceval-valid_high_school_mathematics | 2|none | 0|acc |↑ |0.2222|± |0.1008|
87
- |ceval-valid_high_school_physics | 2|none | 0|acc |↑ |0.2105|± |0.0961|
88
- |ceval-valid_high_school_politics | 2|none | 0|acc |↑ |0.3684|± |0.1137|
89
- |ceval-valid_ideological_and_moral_cultivation | 2|none | 0|acc |↑ |0.3684|± |0.1137|
90
- |ceval-valid_law | 2|none | 0|acc |↑ |0.2083|± |0.0847|
91
- |ceval-valid_legal_professional | 2|none | 0|acc |↑ |0.1304|± |0.0718|
92
- |ceval-valid_logic | 2|none | 0|acc |↑ |0.2727|± |0.0972|
93
- |ceval-valid_mao_zedong_thought | 2|none | 0|acc |↑ |0.2500|± |0.0903|
94
- |ceval-valid_marxism | 2|none | 0|acc |↑ |0.2105|± |0.0961|
95
- |ceval-valid_metrology_engineer | 2|none | 0|acc |↑ |0.0833|± |0.0576|
96
- |ceval-valid_middle_school_biology | 2|none | 0|acc |↑ |0.2381|± |0.0952|
97
- |ceval-valid_middle_school_chemistry | 2|none | 0|acc |↑ |0.2500|± |0.0993|
98
- |ceval-valid_middle_school_geography | 2|none | 0|acc |↑ |0.2500|± |0.1306|
99
- |ceval-valid_middle_school_history | 2|none | 0|acc |↑ |0.2727|± |0.0972|
100
- |ceval-valid_middle_school_mathematics | 2|none | 0|acc |↑ |0.1579|± |0.0859|
101
- |ceval-valid_middle_school_physics | 2|none | 0|acc |↑ |0.2105|± |0.0961|
102
- |ceval-valid_middle_school_politics | 2|none | 0|acc |↑ |0.1905|± |0.0878|
103
- |ceval-valid_modern_chinese_history | 2|none | 0|acc |↑ |0.1304|± |0.0718|
104
- |ceval-valid_operating_system | 2|none | 0|acc |↑ |0.4211|± |0.1164|
105
- |ceval-valid_physician | 2|none | 0|acc |↑ |0.2449|± |0.0621|
106
- |ceval-valid_plant_protection | 2|none | 0|acc |↑ |0.3182|± |0.1016|
107
- |ceval-valid_probability_and_statistics | 2|none | 0|acc |↑ |0.1111|± |0.0762|
108
- |ceval-valid_professional_tour_guide | 2|none | 0|acc |↑ |0.3448|± |0.0898|
109
- |ceval-valid_sports_science | 2|none | 0|acc |↑ |0.2632|± |0.1038|
110
- |ceval-valid_tax_accountant | 2|none | 0|acc |↑ |0.1633|± |0.0533|
111
- |ceval-valid_teacher_qualification | 2|none | 0|acc |↑ |0.1364|± |0.0523|
112
- |ceval-valid_urban_and_rural_planner | 2|none | 0|acc |↑ |0.2174|± |0.0615|
113
- |ceval-valid_veterinary_medicine | 2|none | 0|acc |↑ |0.2609|± |0.0936|
114
  </details>
115
 
116
  以下是模型的问答输出(由于模型过小,推荐加大repetition penalty):
 
52
  模型跑分结果如下,使用lm_eval框架:
53
  | Groups |Version|Filter|n-shot| Metric | |Value | |Stderr|
54
  |-----------|------:|------|------|--------|---|-----:|---|-----:|
55
+ |ceval-valid| 2|none | 5|acc |↑ |0.2422|± |0.0117|
56
  <details style="color:rgb(128,128,128)">
57
  <summary>ceval bench result</summary>
58
 
59
  | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
60
  |----------------------------------------------------|------:|------|-----:|--------|---|-----:|---|-----:|
61
+ |ceval-valid | 2|none | |acc |↑ |0.2422|± |0.0117|
62
+ |ceval-valid_accountant | 2|none | 5|acc |↑ |0.2449|± |0.0621|
63
+ |ceval-valid_advanced_mathematics | 2|none | 5|acc |↑ |0.1053|± |0.0723|
64
+ |ceval-valid_art_studies | 2|none | 5|acc |↑ |0.1818|± |0.0682|
65
+ |ceval-valid_basic_medicine | 2|none | 5|acc |↑ |0.2105|± |0.0961|
66
+ |ceval-valid_business_administration | 2|none | 5|acc |↑ |0.2424|± |0.0758|
67
+ |ceval-valid_chinese_language_and_literature | 2|none | 5|acc |↑ |0.3043|± |0.0981|
68
+ |ceval-valid_civil_servant | 2|none | 5|acc |↑ |0.1489|± |0.0525|
69
+ |ceval-valid_clinical_medicine | 2|none | 5|acc |↑ |0.3182|± |0.1016|
70
+ |ceval-valid_college_chemistry | 2|none | 5|acc |↑ |0.2500|± |0.0903|
71
+ |ceval-valid_college_economics | 2|none | 5|acc |↑ |0.2727|± |0.0606|
72
+ |ceval-valid_college_physics | 2|none | 5|acc |↑ |0.4211|± |0.1164|
73
+ |ceval-valid_college_programming | 2|none | 5|acc |↑ |0.2973|± |0.0762|
74
+ |ceval-valid_computer_architecture | 2|none | 5|acc |↑ |0.3810|± |0.1086|
75
+ |ceval-valid_computer_network | 2|none | 5|acc |↑ |0.1579|± |0.0859|
76
+ |ceval-valid_discrete_mathematics | 2|none | 5|acc |↑ |0.3750|± |0.1250|
77
+ |ceval-valid_education_science | 2|none | 5|acc |↑ |0.3103|± |0.0874|
78
+ |ceval-valid_electrical_engineer | 2|none | 5|acc |↑ |0.2973|± |0.0762|
79
+ |ceval-valid_environmental_impact_assessment_engineer| 2|none | 5|acc |↑ |0.1613|± |0.0672|
80
+ |ceval-valid_fire_engineer | 2|none | 5|acc |↑ |0.2258|± |0.0763|
81
+ |ceval-valid_high_school_biology | 2|none | 5|acc |↑ |0.1053|± |0.0723|
82
+ |ceval-valid_high_school_chemistry | 2|none | 5|acc |↑ |0.1579|± |0.0859|
83
+ |ceval-valid_high_school_chinese | 2|none | 5|acc |↑ |0.1053|± |0.0723|
84
+ |ceval-valid_high_school_geography | 2|none | 5|acc |↑ |0.3158|± |0.1096|
85
+ |ceval-valid_high_school_history | 2|none | 5|acc |↑ |0.3500|± |0.1094|
86
+ |ceval-valid_high_school_mathematics | 2|none | 5|acc |↑ |0.2222|± |0.1008|
87
+ |ceval-valid_high_school_physics | 2|none | 5|acc |↑ |0.1579|± |0.0859|
88
+ |ceval-valid_high_school_politics | 2|none | 5|acc |↑ |0.5789|± |0.1164|
89
+ |ceval-valid_ideological_and_moral_cultivation | 2|none | 5|acc |↑ |0.3158|± |0.1096|
90
+ |ceval-valid_law | 2|none | 5|acc |↑ |0.1250|± |0.0690|
91
+ |ceval-valid_legal_professional | 2|none | 5|acc |↑ |0.2174|± |0.0879|
92
+ |ceval-valid_logic | 2|none | 5|acc |↑ |0.2273|± |0.0914|
93
+ |ceval-valid_mao_zedong_thought | 2|none | 5|acc |↑ |0.2083|± |0.0847|
94
+ |ceval-valid_marxism | 2|none | 5|acc |↑ |0.3158|± |0.1096|
95
+ |ceval-valid_metrology_engineer | 2|none | 5|acc |↑ |0.2083|± |0.0847|
96
+ |ceval-valid_middle_school_biology | 2|none | 5|acc |↑ |0.3810|± |0.1086|
97
+ |ceval-valid_middle_school_chemistry | 2|none | 5|acc |↑ |0.2500|± |0.0993|
98
+ |ceval-valid_middle_school_geography | 2|none | 5|acc |↑ |0.0833|± |0.0833|
99
+ |ceval-valid_middle_school_history | 2|none | 5|acc |↑ |0.1818|± |0.0842|
100
+ |ceval-valid_middle_school_mathematics | 2|none | 5|acc |↑ |0.2632|± |0.1038|
101
+ |ceval-valid_middle_school_physics | 2|none | 5|acc |↑ |0.4737|± |0.1177|
102
+ |ceval-valid_middle_school_politics | 2|none | 5|acc |↑ |0.2381|± |0.0952|
103
+ |ceval-valid_modern_chinese_history | 2|none | 5|acc |↑ |0.1739|± |0.0808|
104
+ |ceval-valid_operating_system | 2|none | 5|acc |↑ |0.1579|± |0.0859|
105
+ |ceval-valid_physician | 2|none | 5|acc |↑ |0.2041|± |0.0582|
106
+ |ceval-valid_plant_protection | 2|none | 5|acc |↑ |0.2273|± |0.0914|
107
+ |ceval-valid_probability_and_statistics | 2|none | 5|acc |↑ |0.2222|± |0.1008|
108
+ |ceval-valid_professional_tour_guide | 2|none | 5|acc |↑ |0.3103|± |0.0874|
109
+ |ceval-valid_sports_science | 2|none | 5|acc |↑ |0.1579|± |0.0859|
110
+ |ceval-valid_tax_accountant | 2|none | 5|acc |↑ |0.2041|± |0.0582|
111
+ |ceval-valid_teacher_qualification | 2|none | 5|acc |↑ |0.2727|± |0.0679|
112
+ |ceval-valid_urban_and_rural_planner | 2|none | 5|acc |↑ |0.1739|± |0.0565|
113
+ |ceval-valid_veterinary_medicine | 2|none | 5|acc |↑ |0.2609|± |0.0936|
114
  </details>
115
 
116
  以下是模型的问答输出(由于模型过小,推荐加大repetition penalty):