Salyut1 commited on
Commit
531df31
·
verified ·
1 Parent(s): 2a557b8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +83 -0
README.md CHANGED
@@ -16,6 +16,89 @@ pipeline_tag: text-generation
16
 
17
  Check the [original model card](https://huggingface.co/zai-org/GLM-4.7) for information about this model.
18
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
  vLLM Inference Note:
20
 
21
  I needed to patch `vllm/model_executor/models/glm4_moe.py` to skip specific k_scale and v_scale parameters if they are missing from the checkpoint, rather than crashing. The below script fixed my k_scale and v_scale errors.
 
16
 
17
  Check the [original model card](https://huggingface.co/zai-org/GLM-4.7) for information about this model.
18
 
19
+ ---
20
+
21
+ ### **MMLU Benchmark Results: Salyut1/GLM-4.7-NVFP4**
22
+ #### **Summary Table**
23
+ | Groups | Version | Metric | Value | Stderr |
24
+ | --- | --- | --- | --- | --- |
25
+ | **MMLU (Total)** | 2 | acc ↑ | **0.8348** | ± 0.0030 |
26
+ | **Social Sciences** | 2 | acc ↑ | **0.9051** | ± 0.0052 |
27
+ | **Other** | 2 | acc ↑ | **0.8684** | ± 0.0058 |
28
+ | **STEM** | 2 | acc ↑ | **0.8351** | ± 0.0064 |
29
+ | **Humanities** | 2 | acc ↑ | **0.7664** | ± 0.0059 |
30
+ #### **STEM**
31
+ | Tasks | n-shot | Metric | Value | Stderr |
32
+ | --- | --- | --- | --- | --- |
33
+ | High School Biology | 0 | acc ↑ | 0.9516 | ± 0.0122 |
34
+ | College Biology | 0 | acc ↑ | 0.9514 | ± 0.0180 |
35
+ | Astronomy | 0 | acc ↑ | 0.9474 | ± 0.0182 |
36
+ | High School Computer Science | 0 | acc ↑ | 0.9300 | ± 0.0256 |
37
+ | Conceptual Physics | 0 | acc ↑ | 0.9064 | ± 0.0190 |
38
+ | Elementary Mathematics | 0 | acc ↑ | 0.8862 | ± 0.0164 |
39
+ | Electrical Engineering | 0 | acc ↑ | 0.8690 | ± 0.0281 |
40
+ | High School Statistics | 0 | acc ↑ | 0.8565 | ± 0.0239 |
41
+ | College Computer Science | 0 | acc ↑ | 0.8400 | ± 0.0368 |
42
+ | Anatomy | 0 | acc ↑ | 0.8296 | ± 0.0325 |
43
+ | High School Physics | 0 | acc ↑ | 0.7947 | ± 0.0330 |
44
+ | High School Chemistry | 0 | acc ↑ | 0.7882 | ± 0.0287 |
45
+ | Machine Learning | 0 | acc ↑ | 0.7679 | ± 0.0401 |
46
+ | College Physics | 0 | acc ↑ | 0.7647 | ± 0.0422 |
47
+ | Abstract Algebra | 0 | acc ↑ | 0.6800 | ± 0.0469 |
48
+ | College Chemistry | 0 | acc ↑ | 0.6800 | ± 0.0469 |
49
+ | College Mathematics | 0 | acc ↑ | 0.6800 | ± 0.0469 |
50
+ | High School Mathematics | 0 | acc ↑ | 0.6481 | ± 0.0291 |
51
+ #### **Social Sciences**
52
+ | Tasks | n-shot | Metric | Value | Stderr |
53
+ | --- | --- | --- | --- | --- |
54
+ | High School Government/Politics | 0 | acc ↑ | 0.9793 | ± 0.0103 |
55
+ | High School Microeconomics | 0 | acc ↑ | 0.9706 | ± 0.0110 |
56
+ | High School Psychology | 0 | acc ↑ | 0.9523 | ± 0.0091 |
57
+ | Human Sexuality | 0 | acc ↑ | 0.9313 | ± 0.0222 |
58
+ | Sociology | 0 | acc ↑ | 0.9204 | ± 0.0191 |
59
+ | High School Geography | 0 | acc ↑ | 0.9192 | ± 0.0194 |
60
+ | High School Macroeconomics | 0 | acc ↑ | 0.9000 | ± 0.0152 |
61
+ | US Foreign Policy | 0 | acc ↑ | 0.9000 | ± 0.0302 |
62
+ | Professional Psychology | 0 | acc ↑ | 0.8725 | ± 0.0135 |
63
+ | Security Studies | 0 | acc ↑ | 0.8653 | ± 0.0219 |
64
+ | Public Relations | 0 | acc ↑ | 0.7636 | ± 0.0407 |
65
+ | Econometrics | 0 | acc ↑ | 0.7544 | ± 0.0405 |
66
+ #### **Humanities**
67
+ | Tasks | n-shot | Metric | Value | Stderr |
68
+ | --- | --- | --- | --- | --- |
69
+ | High School US History | 0 | acc ↑ | 0.9461 | ± 0.0159 |
70
+ | High School World History | 0 | acc ↑ | 0.9367 | ± 0.0158 |
71
+ | World Religions | 0 | acc ↑ | 0.9064 | ± 0.0223 |
72
+ | Prehistory | 0 | acc ↑ | 0.8981 | ± 0.0168 |
73
+ | International Law | 0 | acc ↑ | 0.8926 | ± 0.0283 |
74
+ | Jurisprudence | 0 | acc ↑ | 0.8889 | ± 0.0304 |
75
+ | Logical Fallacies | 0 | acc ↑ | 0.8834 | ± 0.0252 |
76
+ | High School European History | 0 | acc ↑ | 0.8788 | ± 0.0255 |
77
+ | Moral Disputes | 0 | acc ↑ | 0.8699 | ± 0.0181 |
78
+ | Philosophy | 0 | acc ↑ | 0.8617 | ± 0.0196 |
79
+ | Formal Logic | 0 | acc ↑ | 0.7460 | ± 0.0389 |
80
+ | Professional Law | 0 | acc ↑ | 0.6610 | ± 0.0121 |
81
+ | Moral Scenarios | 0 | acc ↑ | 0.6425 | ± 0.0160 |
82
+ #### **Other**
83
+ | Tasks | n-shot | Metric | Value | Stderr |
84
+ | --- | --- | --- | --- | --- |
85
+ | Medical Genetics | 0 | acc ↑ | 0.9800 | ± 0.0141 |
86
+ | Marketing | 0 | acc ↑ | 0.9530 | ± 0.0139 |
87
+ | Miscellaneous | 0 | acc ↑ | 0.9374 | ± 0.0087 |
88
+ | Professional Medicine | 0 | acc ↑ | 0.9301 | ± 0.0155 |
89
+ | Clinical Knowledge | 0 | acc ↑ | 0.9057 | ± 0.0180 |
90
+ | Nutrition | 0 | acc ↑ | 0.9052 | ± 0.0168 |
91
+ | Management | 0 | acc ↑ | 0.8932 | ± 0.0306 |
92
+ | Business Ethics | 0 | acc ↑ | 0.8600 | ± 0.0349 |
93
+ | Computer Security | 0 | acc ↑ | 0.8600 | ± 0.0349 |
94
+ | Human Aging | 0 | acc ↑ | 0.8161 | ± 0.0260 |
95
+ | College Medicine | 0 | acc ↑ | 0.7977 | ± 0.0306 |
96
+ | Professional Accounting | 0 | acc ↑ | 0.7624 | ± 0.0254 |
97
+ | Global Facts | 0 | acc ↑ | 0.6500 | ± 0.0479 |
98
+ | Virology | 0 | acc ↑ | 0.5723 | ± 0.0385 |
99
+
100
+ ---
101
+
102
  vLLM Inference Note:
103
 
104
  I needed to patch `vllm/model_executor/models/glm4_moe.py` to skip specific k_scale and v_scale parameters if they are missing from the checkpoint, rather than crashing. The below script fixed my k_scale and v_scale errors.