SnifferCaptain commited on
Commit
3527d6f
·
verified ·
1 Parent(s): 4e70444

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +60 -2
README.md CHANGED
@@ -49,10 +49,68 @@ YModel2 is the most powerful large language model (LLM) trained by SnifferCaptai
49
  ## 模型性能 Model Performance
50
  模型在多个数据集上跑分,仅供娱乐参考:
51
 
52
- 模型跑分结果如下,使用lm_eval框架:
 
 
 
53
  <details style="color:rgb(128,128,128)">
54
  <summary>ceval bench result</summary>
55
- null
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
  </details>
57
 
58
  以下是模型的问答输出(由于模型过小,推荐加大repetition penalty):
 
49
  ## 模型性能 Model Performance
50
  模型在多个数据集上跑分,仅供娱乐参考:
51
 
52
+ 模型跑分结果如下,使用lm_eval框架:
53
+ | Groups |Version|Filter|n-shot| Metric | |Value | |Stderr|
54
+ |-----------|------:|------|------|--------|---|-----:|---|-----:|
55
+ |ceval-valid| 2|none | |acc |↑ |0.2452|± |0.0117|
56
  <details style="color:rgb(128,128,128)">
57
  <summary>ceval bench result</summary>
58
+
59
+ | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr|
60
+ |----------------------------------------------------|------:|------|-----:|--------|---|-----:|---|-----:|
61
+ |ceval-valid | 2|none | |acc |↑ |0.2452|± |0.0117|
62
+ |ceval-valid_accountant | 2|none | 0|acc |↑ |0.2449|± |0.0621|
63
+ |ceval-valid_advanced_mathematics | 2|none | 0|acc |↑ |0.2632|± |0.1038|
64
+ |ceval-valid_art_studies | 2|none | 0|acc |↑ |0.1212|± |0.0577|
65
+ |ceval-valid_basic_medicine | 2|none | 0|acc |↑ |0.0000|± |0.0000|
66
+ |ceval-valid_business_administration | 2|none | 0|acc |↑ |0.3636|± |0.0850|
67
+ |ceval-valid_chinese_language_and_literature | 2|none | 0|acc |↑ |0.2609|± |0.0936|
68
+ |ceval-valid_civil_servant | 2|none | 0|acc |↑ |0.2766|± |0.0660|
69
+ |ceval-valid_clinical_medicine | 2|none | 0|acc |↑ |0.2273|± |0.0914|
70
+ |ceval-valid_college_chemistry | 2|none | 0|acc |↑ |0.1250|± |0.0690|
71
+ |ceval-valid_college_economics | 2|none | 0|acc |↑ |0.3818|± |0.0661|
72
+ |ceval-valid_college_physics | 2|none | 0|acc |↑ |0.2632|± |0.1038|
73
+ |ceval-valid_college_programming | 2|none | 0|acc |↑ |0.2973|± |0.0762|
74
+ |ceval-valid_computer_architecture | 2|none | 0|acc |↑ |0.2381|± |0.0952|
75
+ |ceval-valid_computer_network | 2|none | 0|acc |↑ |0.0526|± |0.0526|
76
+ |ceval-valid_discrete_mathematics | 2|none | 0|acc |↑ |0.3125|± |0.1197|
77
+ |ceval-valid_education_science | 2|none | 0|acc |↑ |0.4828|± |0.0944|
78
+ |ceval-valid_electrical_engineer | 2|none | 0|acc |↑ |0.2703|± |0.0740|
79
+ |ceval-valid_environmental_impact_assessment_engineer| 2|none | 0|acc |↑ |0.1935|± |0.0721|
80
+ |ceval-valid_fire_engineer | 2|none | 0|acc |↑ |0.3871|± |0.0889|
81
+ |ceval-valid_high_school_biology | 2|none | 0|acc |↑ |0.3684|± |0.1137|
82
+ |ceval-valid_high_school_chemistry | 2|none | 0|acc |↑ |0.1579|± |0.0859|
83
+ |ceval-valid_high_school_chinese | 2|none | 0|acc |↑ |0.2632|± |0.1038|
84
+ |ceval-valid_high_school_geography | 2|none | 0|acc |↑ |0.2105|± |0.0961|
85
+ |ceval-valid_high_school_history | 2|none | 0|acc |↑ |0.3000|± |0.1051|
86
+ |ceval-valid_high_school_mathematics | 2|none | 0|acc |↑ |0.2222|± |0.1008|
87
+ |ceval-valid_high_school_physics | 2|none | 0|acc |↑ |0.2105|± |0.0961|
88
+ |ceval-valid_high_school_politics | 2|none | 0|acc |↑ |0.3684|± |0.1137|
89
+ |ceval-valid_ideological_and_moral_cultivation | 2|none | 0|acc |↑ |0.3684|± |0.1137|
90
+ |ceval-valid_law | 2|none | 0|acc |↑ |0.2083|± |0.0847|
91
+ |ceval-valid_legal_professional | 2|none | 0|acc |↑ |0.1304|± |0.0718|
92
+ |ceval-valid_logic | 2|none | 0|acc |↑ |0.2727|± |0.0972|
93
+ |ceval-valid_mao_zedong_thought | 2|none | 0|acc |↑ |0.2500|± |0.0903|
94
+ |ceval-valid_marxism | 2|none | 0|acc |↑ |0.2105|± |0.0961|
95
+ |ceval-valid_metrology_engineer | 2|none | 0|acc |↑ |0.0833|± |0.0576|
96
+ |ceval-valid_middle_school_biology | 2|none | 0|acc |↑ |0.2381|± |0.0952|
97
+ |ceval-valid_middle_school_chemistry | 2|none | 0|acc |↑ |0.2500|± |0.0993|
98
+ |ceval-valid_middle_school_geography | 2|none | 0|acc |↑ |0.2500|± |0.1306|
99
+ |ceval-valid_middle_school_history | 2|none | 0|acc |↑ |0.2727|± |0.0972|
100
+ |ceval-valid_middle_school_mathematics | 2|none | 0|acc |↑ |0.1579|± |0.0859|
101
+ |ceval-valid_middle_school_physics | 2|none | 0|acc |↑ |0.2105|± |0.0961|
102
+ |ceval-valid_middle_school_politics | 2|none | 0|acc |↑ |0.1905|± |0.0878|
103
+ |ceval-valid_modern_chinese_history | 2|none | 0|acc |↑ |0.1304|± |0.0718|
104
+ |ceval-valid_operating_system | 2|none | 0|acc |↑ |0.4211|± |0.1164|
105
+ |ceval-valid_physician | 2|none | 0|acc |↑ |0.2449|± |0.0621|
106
+ |ceval-valid_plant_protection | 2|none | 0|acc |↑ |0.3182|± |0.1016|
107
+ |ceval-valid_probability_and_statistics | 2|none | 0|acc |↑ |0.1111|± |0.0762|
108
+ |ceval-valid_professional_tour_guide | 2|none | 0|acc |↑ |0.3448|± |0.0898|
109
+ |ceval-valid_sports_science | 2|none | 0|acc |↑ |0.2632|± |0.1038|
110
+ |ceval-valid_tax_accountant | 2|none | 0|acc |↑ |0.1633|± |0.0533|
111
+ |ceval-valid_teacher_qualification | 2|none | 0|acc |↑ |0.1364|± |0.0523|
112
+ |ceval-valid_urban_and_rural_planner | 2|none | 0|acc |↑ |0.2174|± |0.0615|
113
+ |ceval-valid_veterinary_medicine | 2|none | 0|acc |↑ |0.2609|± |0.0936|
114
  </details>
115
 
116
  以下是模型的问答输出(由于模型过小,推荐加大repetition penalty):