zhiyucheng commited on
Commit
f35b051
·
verified ·
1 Parent(s): 7019896

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +51 -1
README.md CHANGED
@@ -36,7 +36,7 @@ Global <br>
36
  Developers looking to take off-the-shelf, pre-quantized models for deployment in AI Agent systems, chatbots, RAG systems, and other AI-powered applications. <br>
37
 
38
  ### Release Date: <br>
39
- Huggingface 03/06/2026 via https://huggingface.co/nvidia/GLM-5-NVFP4 <br>
40
 
41
  ## Model Architecture:
42
  **Architecture Type:** Transformers <br>
@@ -112,6 +112,56 @@ To serve this checkpoint with [SGLang](https://github.com/sgl-project/sglang), y
112
  python3 -m sglang.launch_server --model nvidia/GLM-5-NVFP4 --tensor-parallel-size 8 --quantization modelopt_fp4 --tool-call-parser glm47 --reasoning-parser glm45 --trust-remote-code --chunked-prefill-size 131072 --mem-fraction-static 0.80
113
  ```
114
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
115
 
116
  ## Model Limitations:
117
  The base model was trained on data that contains toxic language and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive.
 
36
  Developers looking to take off-the-shelf, pre-quantized models for deployment in AI Agent systems, chatbots, RAG systems, and other AI-powered applications. <br>
37
 
38
  ### Release Date: <br>
39
+ Huggingface 03/16/2026 via https://huggingface.co/nvidia/GLM-5-NVFP4 <br>
40
 
41
  ## Model Architecture:
42
  **Architecture Type:** Transformers <br>
 
112
  python3 -m sglang.launch_server --model nvidia/GLM-5-NVFP4 --tensor-parallel-size 8 --quantization modelopt_fp4 --tool-call-parser glm47 --reasoning-parser glm45 --trust-remote-code --chunked-prefill-size 131072 --mem-fraction-static 0.80
113
  ```
114
 
115
+ If you would like to enable expert parallel when launch the SGLang endpoint, please build docker with provided [dockerfile](https://huggingface.co/nvidia/GLM-5-NVFP4/blob/main/dockerfile).
116
+
117
+ ## Evaluation
118
+ The accuracy benchmark results are presented in the table below:
119
+ <table>
120
+ <tr>
121
+ <td><strong>Precision</strong>
122
+ </td>
123
+ <td><strong>MMLU Pro</strong>
124
+ </td>
125
+ <td><strong>GPQA Diamond</strong>
126
+ </td>
127
+ <td><strong>SciCode</strong>
128
+ </td>
129
+ <td><strong>IFBench</strong>
130
+ </td>
131
+ <td><strong>HLE</strong>
132
+ </td>
133
+ </tr>
134
+ <tr>
135
+ <td>FP8
136
+ </td>
137
+ <td>0.858
138
+ </td>
139
+ <td>0.862
140
+ </td>
141
+ <td>0.488
142
+ </td>
143
+ <td>0.717
144
+ </td>
145
+ <td>0.274
146
+ </td>
147
+ </tr>
148
+ <tr>
149
+ <td>NVFP4
150
+ </td>
151
+ <td>0.861
152
+ </td>
153
+ <td>0.855
154
+ </td>
155
+ <td>0.478
156
+ </td>
157
+ <td>0.712
158
+ </td>
159
+ <td>0.275
160
+ </td>
161
+ </tr>
162
+ <tr>
163
+ </table>
164
+
165
 
166
  ## Model Limitations:
167
  The base model was trained on data that contains toxic language and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive.