JunHowie commited on
Commit
a9d2a7f
·
verified ·
1 Parent(s): a6c4031

Add files using upload-large-folder tool

Browse files
This view is limited to 50 files because it contains too many changes.   See raw diff
Files changed (50) hide show
  1. .gitattributes +2 -0
  2. .mdl +0 -0
  3. .msc +0 -0
  4. .mv +1 -0
  5. README.md +225 -0
  6. chat_template.jinja +86 -0
  7. config.json +75 -0
  8. generation_config.json +12 -0
  9. model-00082-of-00141.safetensors +3 -0
  10. model-00084-of-00141.safetensors +3 -0
  11. model-00085-of-00141.safetensors +3 -0
  12. model-00087-of-00141.safetensors +3 -0
  13. model-00089-of-00141.safetensors +3 -0
  14. model-00090-of-00141.safetensors +3 -0
  15. model-00091-of-00141.safetensors +3 -0
  16. model-00092-of-00141.safetensors +3 -0
  17. model-00093-of-00141.safetensors +3 -0
  18. model-00094-of-00141.safetensors +3 -0
  19. model-00095-of-00141.safetensors +3 -0
  20. model-00097-of-00141.safetensors +3 -0
  21. model-00099-of-00141.safetensors +3 -0
  22. model-00100-of-00141.safetensors +3 -0
  23. model-00101-of-00141.safetensors +3 -0
  24. model-00102-of-00141.safetensors +3 -0
  25. model-00104-of-00141.safetensors +3 -0
  26. model-00105-of-00141.safetensors +3 -0
  27. model-00107-of-00141.safetensors +3 -0
  28. model-00108-of-00141.safetensors +3 -0
  29. model-00109-of-00141.safetensors +3 -0
  30. model-00112-of-00141.safetensors +3 -0
  31. model-00114-of-00141.safetensors +3 -0
  32. model-00115-of-00141.safetensors +3 -0
  33. model-00116-of-00141.safetensors +3 -0
  34. model-00117-of-00141.safetensors +3 -0
  35. model-00118-of-00141.safetensors +3 -0
  36. model-00119-of-00141.safetensors +3 -0
  37. model-00123-of-00141.safetensors +3 -0
  38. model-00124-of-00141.safetensors +3 -0
  39. model-00128-of-00141.safetensors +3 -0
  40. model-00129-of-00141.safetensors +3 -0
  41. model-00130-of-00141.safetensors +3 -0
  42. model-00131-of-00141.safetensors +3 -0
  43. model-00132-of-00141.safetensors +3 -0
  44. model-00133-of-00141.safetensors +3 -0
  45. model-00135-of-00141.safetensors +3 -0
  46. model-00138-of-00141.safetensors +3 -0
  47. model-00139-of-00141.safetensors +3 -0
  48. model-00141-of-00141.safetensors +3 -0
  49. model.safetensors.index.json +3 -0
  50. tokenizer.json +3 -0
.gitattributes CHANGED
@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ model.safetensors.index.json filter=lfs diff=lfs merge=lfs -text
37
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
.mdl ADDED
Binary file (39 Bytes). View file
 
.msc ADDED
Binary file (12.7 kB). View file
 
.mv ADDED
@@ -0,0 +1 @@
 
 
1
+ Revision:master,CreatedAt:1772131132
README.md ADDED
@@ -0,0 +1,225 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: mit
4
+ pipeline_tag: text-to-text
5
+ tags:
6
+ - vLLM
7
+ - AWQ
8
+ base_model:
9
+ - ZhipuAI/GLM-5
10
+ base_model_relation: quantized
11
+
12
+ ---
13
+ # GLM-5-AWQ
14
+ Base model: [ZhipuAI/GLM-5](https://www.modelscope.cn/models/ZhipuAI/GLM-5)
15
+
16
+ This repo quantizes the model using data-free quantization (no calibration dataset required).
17
+
18
+ ### 【Dependencies / Installation】
19
+
20
+ ```python
21
+ # NOTE:
22
+ # vllm==0.16.0rc2 absolutely would NOT work!
23
+ # Must upgrade to >=0.16.1rc1
24
+ vllm>=0.16.1rc1.dev7
25
+ transformers>=5.3.0.dev0
26
+ ```
27
+
28
+ As of **2026-02-26**, make sure your system has cuda12.8 installed.
29
+
30
+ Then, create a fresh Python environment (e.g. python3.12 venv) and run:
31
+ ```bash
32
+ pip install -U vllm --pre --index-url https://pypi.org/simple --extra-index-url https://wheels.vllm.ai/nightly
33
+ pip install git+https://github.com/huggingface/transformers.git
34
+ pip install git+https://github.com/deepseek-ai/DeepGEMM.git@v2.1.1.post3 --no-build-isolation
35
+ ```
36
+ [vLLM Official Guide](https://docs.vllm.ai/projects/recipes/en/latest/Qwen/Qwen3.5.html)
37
+
38
+
39
+ ### 【vLLM Startup Command】
40
+ <i>Note: When launching with TP=8, include `--enable-expert-parallel`;
41
+ otherwise the expert tensors wouldn’t be evenly sharded across GPU devices.</i>
42
+
43
+ ```
44
+ export VLLM_USE_DEEP_GEMM=0
45
+ export VLLM_USE_FLASHINFER_MOE_FP16=1
46
+ export VLLM_USE_FLASHINFER_SAMPLER=0
47
+ export OMP_NUM_THREADS=4
48
+
49
+ vllm serve \
50
+ __YOUR_PATH__/tclf90/GLM-5-AWQ \
51
+ --served-model-name MY_MODEL \
52
+ --swap-space 16 \
53
+ --max-num-seqs 32 \
54
+ --max-model-len 32768 \
55
+ --gpu-memory-utilization 0.9 \
56
+ --tensor-parallel-size 8 \
57
+ --enable-expert-parallel \
58
+ --enable-auto-tool-choice \
59
+ --tool-call-parser glm47 \
60
+ --reasoning-parser glm45 \
61
+ --speculative-config '{"method":"mtp","num_speculative_tokens":1}' \
62
+ --trust-remote-code \
63
+ --host 0.0.0.0 \
64
+ --port 8000
65
+ ```
66
+
67
+ ### 【Logs】
68
+ ```
69
+ 2026-02-26
70
+ 1. Initial commit
71
+ ```
72
+
73
+ ### 【Model Files】
74
+ | File Size | Last Updated |
75
+ |-----------|--------------|
76
+ | `392 GiB` | `2026-02-26` |
77
+
78
+ ### 【Model Download】
79
+ ```python
80
+ from modelscope import snapshot_download
81
+ snapshot_download('tclf90/GLM-5-AWQ', cache_dir="your_local_path")
82
+ ```
83
+
84
+ ### 【Overview】
85
+
86
+ # GLM-5
87
+
88
+ <div align="center">
89
+ <img src=https://raw.githubusercontent.com/zai-org/GLM-5/refs/heads/main/resources/logo.svg width="15%"/>
90
+ </div>
91
+ <p align="center">
92
+ 👋 Join our <a href="https://raw.githubusercontent.com/zai-org/GLM-5/refs/heads/main/resources/wechat.png" target="_blank">WeChat</a> or <a href="https://discord.gg/QR7SARHRxK" target="_blank">Discord</a> community.
93
+ <br>
94
+ 📖 Check out the GLM-5 <a href="https://z.ai/blog/glm-5" target="_blank">technical blog</a>.
95
+ <br>
96
+ 📍 Use GLM-5 API services on <a href="https://docs.z.ai/guides/llm/glm-5">Z.ai API Platform. </a>
97
+ <br>
98
+ 👉 One click to <a href="https://chat.z.ai">GLM-5</a>.
99
+ </p>
100
+
101
+ ## Introduction
102
+
103
+ We are launching GLM-5, targeting complex systems engineering and long-horizon agentic tasks. Scaling is still one of the most important ways to improve the intelligence efficiency of Artificial General Intelligence (AGI). Compared to GLM-4.5, GLM-5 scales from 355B parameters (32B active) to 744B parameters (40B active), and increases pre-training data from 23T to 28.5T tokens. GLM-5 also integrates DeepSeek Sparse Attention (DSA), largely reducing deployment cost while preserving long-context capacity.
104
+
105
+ Reinforcement learning aims to bridge the gap between competence and excellence in pre-trained models. However, deploying it at scale for LLMs is a challenge due to the RL training inefficiency. To this end, we developed [slime](https://github.com/THUDM/slime), a novel **asynchronous RL infrastructure** that substantially improves training throughput and efficiency, enabling more fine-grained post-training iterations. With advances in both pre-training and post-training, GLM-5 delivers significant improvement compared to GLM-4.7 across a wide range of academic benchmarks and achieves best-in-class performance among all open-source models in the world on reasoning, coding, and agentic tasks, closing the gap with frontier models.
106
+
107
+ ## Benchmark
108
+
109
+ | | GLM-5 | GLM-4.7 | DeepSeek-V3.2 | Kimi K2.5 | Claude Opus 4.5 | Gemini 3 Pro | GPT-5.2 (xhigh) |
110
+ | -------------------------------- | ---------------------- | --------- | ------------- |-----------| --------------- | ------------ | --------------- |
111
+ | HLE | 30.5 | 24.8 | 25.1 | 31.5 | 28.4 | 37.2 | 35.4 |
112
+ | HLE (w/ Tools) | 50.4 | 42.8 | 40.8 | 51.8 | 43.4* | 45.8* | 45.5* |
113
+ | AIME 2026 I | 92.7 | 92.9 | 92.7 | 92.5 | 93.3 | 90.6 | - |
114
+ | HMMT Nov. 2025 | 96.9 | 93.5 | 90.2 | 91.1 | 91.7 | 93.0 | 97.1 |
115
+ | IMOAnswerBench | 82.5 | 82.0 | 78.3 | 81.8 | 78.5 | 83.3 | 86.3 |
116
+ | GPQA-Diamond | 86.0 | 85.7 | 82.4 | 87.6 | 87.0 | 91.9 | 92.4 |
117
+ | SWE-bench Verified | 77.8 | 73.8 | 73.1 | 76.8 | 80.9 | 76.2 | 80.0 |
118
+ | SWE-bench Multilingual | 73.3 | 66.7 | 70.2 | 73.0 | 77.5 | 65.0 | 72.0 |
119
+ | Terminal-Bench 2.0 (Terminus 2) | 56.2 / 60.7 † | 41.0 | 39.3 | 50.8 | 59.3 | 54.2 | 54.0 |
120
+ | Terminal-Bench 2.0 (Claude Code) | 56.2 / 61.1 † | 32.8 | 46.4 | - | 57.9 | - | - |
121
+ | CyberGym | 43.2 | 23.5 | 17.3 | 41.3 | 50.6 | 39.9 | - |
122
+ | BrowseComp | 62.0 | 52.0 | 51.4 | 60.6 | 37.0 | 37.8 | - |
123
+ | BrowseComp (w/ Context Manage) | 75.9 | 67.5 | 67.6 | 74.9 | 67.8 | 59.2 | 65.8 |
124
+ | BrowseComp-Zh | 72.7 | 66.6 | 65.0 | 62.3 | 62.4 | 66.8 | 76.1 |
125
+ | τ²-Bench | 89.7 | 87.4 | 85.3 | 80.2 | 91.6 | 90.7 | 85.5 |
126
+ | MCP-Atlas (Public Set) | 67.8 | 52.0 | 62.2 | 63.8 | 65.2 | 66.6 | 68.0 |
127
+ | Tool-Decathlon | 38.0 | 23.8 | 35.2 | 27.8 | 43.5 | 36.4 | 46.3 |
128
+ | Vending Bench 2 | $4,432.12 | $2,376.82 | $1,034.00 | $1,198.46 | $4,967.06 | $5,478.16 | $3,591.33 |
129
+
130
+ > *: refers to their scores of full set.
131
+ >
132
+ > †: A verified version of Terminal-Bench 2.0 that fixes some ambiguous instructions.
133
+ See footnote for more evaluation details.
134
+
135
+ ### Footnote
136
+
137
+ * **Humanity’s Last Exam (HLE) & other reasoning tasks**: We evaluate with a maximum generation length of 131,072 tokens (`temperature=1.0, top_p=0.95, max_new_tokens=131072`). By default, we report the text-only subset; results marked with * are from the full set. We use GPT-5.2 (medium) as the judge model. For HLE-with-tools, we use a maximum context length of 202,752 tokens.
138
+ * **SWE-bench & SWE-bench Multilingual**: We run the SWE-bench suite with OpenHands using a tailored instruction prompt. Settings: `temperature=0.7, top_p=0.95, max_new_tokens=16384`, with a 200K context window.
139
+ * **BrowserComp**: Without context management, we retain details from the most recent 5 turns. With context management, we use the same discard-all strategy as DeepSeek-v3.2 and Kimi K2.5.
140
+ * **Terminal-Bench 2.0 (Terminus 2)**: We evaluate with the Terminus framework using `timeout=2h, temperature=0.7, top_p=1.0, max_new_tokens=8192`, with a 128K context window. Resource limits are capped at 16 CPUs and 32 GB RAM.
141
+ * **Terminal-Bench 2.0 (Claude Code)**: We evaluate in Claude Code 2.1.14 (think mode, default effort) with `temperature=1.0, top_p=0.95, max_new_tokens=65536`. We remove wall-clock time limits due to generation speed, while preserving per-task CPU and memory constraints. Scores are averaged over 5 runs. We fix environment issues introduced by Claude Code and also report results on a verified Terminal-Bench 2.0 dataset that resolves ambiguous instructions (see: [https://huggingface.co/datasets/zai-org/terminal-bench-2-verified](https://huggingface.co/datasets/zai-org/terminal-bench-2-verified)).
142
+ * **CyberGym**: We evaluate in Claude Code 2.1.18 (think mode, no web tools) with (`temperature=1.0, top_p=1.0, max_new_tokens=32000`) and a 250-minute timeout per task. Results are single-run Pass@1 over 1,507 tasks.
143
+ * **MCP-Atlas**: All models are evaluated in think mode on the 500-task public subset with a 10-minute timeout per task. We use Gemini 3 Pro as the judge model.
144
+ * **τ²-bench**: We add a small prompt adjustment in Retail and Telecom to avoid failures caused by premature user termination. For Airline, we apply the domain fixes proposed in the Claude Opus 4.5 system card.
145
+ * **Vending Bench 2**: Runs are conducted independently by [Andon Labs](https://andonlabs.com/evals/vending-bench-2).
146
+
147
+
148
+ ## Serve GLM-5 Locally
149
+
150
+ ### Prepare environment
151
+
152
+ vLLM, SGLang, and xLLM all support local deployment of GLM-5. A simple deployment guide is provided here.
153
+
154
+ + vLLM
155
+
156
+ Using Docker as:
157
+
158
+ ```shell
159
+ docker pull vllm/vllm-openai:nightly
160
+ ```
161
+
162
+ or using pip:
163
+
164
+ ```shell
165
+ pip install -U vllm --pre --index-url https://pypi.org/simple --extra-index-url https://wheels.vllm.ai/nightly
166
+ ```
167
+
168
+ then upgrade transformers:
169
+
170
+ ```
171
+ pip install git+https://github.com/huggingface/transformers.git
172
+ ```
173
+
174
+ + SGLang
175
+
176
+ Using Docker as:
177
+ ```bash
178
+ docker pull lmsysorg/sglang:glm5-hopper # For Hopper GPU
179
+ docker pull lmsysorg/sglang:glm5-blackwell # For Blackwell GPU
180
+ ```
181
+
182
+ ### Deploy
183
+
184
+ + vLLM
185
+
186
+ ```shell
187
+ vllm serve zai-org/GLM-5-FP8 \
188
+ --tensor-parallel-size 8 \
189
+ --gpu-memory-utilization 0.85 \
190
+ --speculative-config.method mtp \
191
+ --speculative-config.num_speculative_tokens 1 \
192
+ --tool-call-parser glm47 \
193
+ --reasoning-parser glm45 \
194
+ --enable-auto-tool-choice \
195
+ --served-model-name glm-5-fp8
196
+ ```
197
+
198
+ Check the [recipes](https://github.com/vllm-project/recipes/blob/main/GLM/GLM5.md) for more details.
199
+
200
+ + SGLang
201
+
202
+ ```shell
203
+ python3 -m sglang.launch_server \
204
+ --model-path zai-org/GLM-5-FP8 \
205
+ --tp-size 8 \
206
+ --tool-call-parser glm47 \
207
+ --reasoning-parser glm45 \
208
+ --speculative-algorithm EAGLE \
209
+ --speculative-num-steps 3 \
210
+ --speculative-eagle-topk 1 \
211
+ --speculative-num-draft-tokens 4 \
212
+ --mem-fraction-static 0.85 \
213
+ --served-model-name glm-5-fp8
214
+ ```
215
+
216
+ Check the [sglang cookbook](https://cookbook.sglang.io/autoregressive/GLM/GLM-5) for more details.
217
+
218
+ + xLLM and other Ascend NPU
219
+
220
+ Please check the deployment guide [here](https://github.com/zai-org/GLM-5/blob/main/example/ascend.md).
221
+
222
+
223
+ ## Citation
224
+
225
+ Our technical report is coming soon.
chat_template.jinja ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [gMASK]<sop>
2
+ {%- if tools -%}
3
+ <|system|>
4
+ # Tools
5
+
6
+ You may call one or more functions to assist with the user query.
7
+
8
+ You are provided with function signatures within <tools></tools> XML tags:
9
+ <tools>
10
+ {% for tool in tools %}
11
+ {{ tool | tojson(ensure_ascii=False) }}
12
+ {% endfor %}
13
+ </tools>
14
+
15
+ For each function call, output the function name and arguments within the following XML format:
16
+ <tool_call>{function-name}<arg_key>{arg-key-1}</arg_key><arg_value>{arg-value-1}</arg_value><arg_key>{arg-key-2}</arg_key><arg_value>{arg-value-2}</arg_value>...</tool_call>{%- endif -%}
17
+ {%- macro visible_text(content) -%}
18
+ {%- if content is string -%}
19
+ {{- content }}
20
+ {%- elif content is iterable and content is not mapping -%}
21
+ {%- for item in content -%}
22
+ {%- if item is mapping and item.type == 'text' -%}
23
+ {{- item.text }}
24
+ {%- elif item is string -%}
25
+ {{- item }}
26
+ {%- endif -%}
27
+ {%- endfor -%}
28
+ {%- else -%}
29
+ {{- content }}
30
+ {%- endif -%}
31
+ {%- endmacro -%}
32
+ {%- set ns = namespace(last_user_index=-1) %}
33
+ {%- for m in messages %}
34
+ {%- if m.role == 'user' %}
35
+ {% set ns.last_user_index = loop.index0 -%}
36
+ {%- endif %}
37
+ {%- endfor %}
38
+ {% for m in messages %}
39
+ {%- if m.role == 'user' -%}<|user|>{{ visible_text(m.content) }}
40
+ {%- elif m.role == 'assistant' -%}
41
+ <|assistant|>
42
+ {%- set reasoning_content = '' %}
43
+ {%- set content = visible_text(m.content) %}
44
+ {%- if m.reasoning_content is string %}
45
+ {%- set reasoning_content = m.reasoning_content %}
46
+ {%- else %}
47
+ {%- if '</think>' in content %}
48
+ {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
49
+ {%- set content = content.split('</think>')[-1].lstrip('\n') %}
50
+ {%- endif %}
51
+ {%- endif %}
52
+ {%- if ((clear_thinking is defined and not clear_thinking) or loop.index0 > ns.last_user_index) and reasoning_content -%}
53
+ {{ '<think>' + reasoning_content.strip() + '</think>'}}
54
+ {%- else -%}
55
+ {{ '</think>' }}
56
+ {%- endif -%}
57
+ {%- if content.strip() -%}
58
+ {{ content.strip() }}
59
+ {%- endif -%}
60
+ {% if m.tool_calls %}
61
+ {% for tc in m.tool_calls %}
62
+ {%- if tc.function %}
63
+ {%- set tc = tc.function %}
64
+ {%- endif %}
65
+ {{- '<tool_call>' + tc.name -}}
66
+ {% set _args = tc.arguments %}{% for k, v in _args.items() %}<arg_key>{{ k }}</arg_key><arg_value>{{ v | tojson(ensure_ascii=False) if v is not string else v }}</arg_value>{% endfor %}</tool_call>{% endfor %}
67
+ {% endif %}
68
+ {%- elif m.role == 'tool' -%}
69
+ {%- if m.content is string -%}
70
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
71
+ {{- '<|observation|>' }}
72
+ {%- endif %}
73
+ {{- '<tool_response>' }}
74
+ {{- m.content }}
75
+ {{- '</tool_response>' }}
76
+ {%- else -%}
77
+ <|observation|>{% for tr in m.content %}
78
+ <tool_response>{{ tr.output if tr.output is defined else tr }}</tool_response>{% endfor -%}
79
+ {% endif -%}
80
+ {%- elif m.role == 'system' -%}
81
+ <|system|>{{ visible_text(m.content) }}
82
+ {%- endif -%}
83
+ {%- endfor -%}
84
+ {%- if add_generation_prompt -%}
85
+ <|assistant|>{{- '</think>' if (enable_thinking is defined and not enable_thinking) else '<think>' -}}
86
+ {%- endif -%}
config.json ADDED
@@ -0,0 +1,75 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "name_or_path": "tclf90/GLM-5-AWQ",
3
+ "architectures": [
4
+ "GlmMoeDsaForCausalLM"
5
+ ],
6
+ "attention_bias": false,
7
+ "attention_dropout": 0.0,
8
+ "dtype": "bfloat16",
9
+ "eos_token_id": [
10
+ 154820,
11
+ 154827,
12
+ 154829
13
+ ],
14
+ "ep_size": 1,
15
+ "first_k_dense_replace": 3,
16
+ "hidden_act": "silu",
17
+ "head_dim": 64,
18
+ "hidden_size": 6144,
19
+ "index_head_dim": 128,
20
+ "index_n_heads": 32,
21
+ "index_topk": 2048,
22
+ "indexer_rope_interleave": true,
23
+ "initializer_range": 0.02,
24
+ "intermediate_size": 12288,
25
+ "kv_lora_rank": 512,
26
+ "max_position_embeddings": 202752,
27
+ "moe_intermediate_size": 2048,
28
+ "moe_layer_freq": 1,
29
+ "model_type": "glm_moe_dsa",
30
+ "n_group": 1,
31
+ "n_routed_experts": 256,
32
+ "n_shared_experts": 1,
33
+ "norm_topk_prob": true,
34
+ "num_attention_heads": 64,
35
+ "num_experts_per_tok": 8,
36
+ "num_hidden_layers": 78,
37
+ "num_key_value_heads": 64,
38
+ "num_nextn_predict_layers": 1,
39
+ "pad_token_id": 154820,
40
+ "pretraining_tp": 1,
41
+ "q_lora_rank": 2048,
42
+ "qk_head_dim": 256,
43
+ "qk_nope_head_dim": 192,
44
+ "qk_rope_head_dim": 64,
45
+ "rms_norm_eps": 1e-05,
46
+ "rope_interleave": true,
47
+ "rope_parameters": {
48
+ "rope_theta": 1000000,
49
+ "rope_type": "default"
50
+ },
51
+ "routed_scaling_factor": 2.5,
52
+ "scoring_func": "sigmoid",
53
+ "tie_word_embeddings": false,
54
+ "topk_group": 1,
55
+ "topk_method": "noaux_tc",
56
+ "transformers_version": "5.0.2.dev0",
57
+ "use_cache": true,
58
+ "v_head_dim": 256,
59
+ "vocab_size": 154880,
60
+ "quantization_config": {
61
+ "quant_method": "awq",
62
+ "bits": 4,
63
+ "group_size": 128,
64
+ "version": "gemm",
65
+ "zero_point": true,
66
+ "modules_to_not_convert": [
67
+ "self_attn",
68
+ "shared_expert",
69
+ "mlp.gate",
70
+ "model.layers.0.",
71
+ "model.layers.1.",
72
+ "model.layers.2."
73
+ ]
74
+ }
75
+ }
generation_config.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "eos_token_id": [
4
+ 154820,
5
+ 154827,
6
+ 154829
7
+ ],
8
+ "pad_token_id": 154820,
9
+ "temperature": 1.0,
10
+ "top_p": 0.95,
11
+ "transformers_version": "5.0.2.dev0"
12
+ }
model-00082-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f2ea5c2c328515811836d05f2524b7b8f0e85e09577db536eb2e2a08a5dfc146
3
+ size 2994214048
model-00084-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3ee21677b046e8c42f2b2690ffcb64fa4eb22cf0ab095333719ffb5341527aae
3
+ size 2994213912
model-00085-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e3e4d8c454ffa0bf5cf26874d45e9936fd0b85ab26d600f203bbf26227fc7af
3
+ size 2999875360
model-00087-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:acf734e2a5d20180cc07d1f8aab0f67679e2186fda641a4b7ac2b886496f1683
3
+ size 2999875608
model-00089-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:15513bec5e3894f6a14753639fa38fbe7ffdb3f969cc57b403716953dba5b39b
3
+ size 2999875840
model-00090-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f88cd200b518043691e1b03298474ac2db757f3daa27baea0c3c16bcefd48cd4
3
+ size 2992576552
model-00091-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:52367e137f82654b72fbf56e8b49da66038fe4ec074efc5a9c7a33ccf9d610e5
3
+ size 2994975120
model-00092-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f53c4832f6ca555fb0dee23e0dfa1807cc990de64d37c9bf49a4489b5f2e6c3d
3
+ size 2999875224
model-00093-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2b36a5a3547a3d45e6c54360c116458a5448cf8d6d1c0e356a0d970c7df48ac4
3
+ size 2994214040
model-00094-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ca572f2a22e4e494793c062f826fd0a267fe5ad3c84ff0d7b12c1cd7b55b2403
3
+ size 2999875224
model-00095-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e4b5110054e1c34c3e1d2b88c96d98265b2c488e4121ef7db5e254806b06eb34
3
+ size 2994213784
model-00097-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0d864701fc5d1bf0e27c69e585c571bf3f26ac132627a704fd94a7a60c3ab7ee
3
+ size 2994213536
model-00099-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c901ab44ebe3a8493d51f1a6bedd4c9167700c1793cdb4c7511a975e95437aa
3
+ size 2998537240
model-00100-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6a948430ab086b1bdfbaec32c3742902b41b82583e293b6ea9813a7bfd609f34
3
+ size 2995552016
model-00101-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:498b1a46e1c452d12311246d1c1ea79769cd548e31be2e2a0ece44a7624f9bd3
3
+ size 2999875216
model-00102-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b6c9927fd89559146117d11582388c3f7691f05520103cfa7a1791a2f1d69332
3
+ size 2994214040
model-00104-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e77c9c59d25c05feb00490e87f166b4176958be273a3a26d71a244c3d1b1deed
3
+ size 2994213928
model-00105-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ffbf45bc5cf4cca0de28f56d00dbdabe2102f3ece747b0a7cc5e98a22ac71b02
3
+ size 2999875344
model-00107-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0d22e0acdbe80f00ffe9e578e4d93a8886b98353f52ad7b8111121066944073f
3
+ size 2999875592
model-00108-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d158752d0c23d921b80b558959a9c2910ac29c35389a2ad1becef70f068c6cd3
3
+ size 2994213424
model-00109-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d3e944715639b341e0c6ebf68de9993f551677488eb2bff1bf157deea8d20f86
3
+ size 2999875824
model-00112-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d1fbb90a6fb222a416726ef9b0fe0eafcd5f30802e7f1e44ac174ab35c6b0813
3
+ size 3000071960
model-00114-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0273e507b3508affbf235a930bcc83a5a3d10f06d804e463a6cfb611446552fe
3
+ size 2999875216
model-00115-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7918c048056a14fefdff45da8704be02fa90fc26bbf3dff43bb897be522638ee
3
+ size 2994213880
model-00116-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e3b81392cb747c98efa304ab472cbd1837d53eece256335b31ca118483970d7
3
+ size 2999875392
model-00117-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dfe32ad3844e322646646d8186688775a27f53a01ac19df7ac8ee89c19cd518c
3
+ size 2994213632
model-00118-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f8ada5ea3d11290182ba5056a8cca1cce1483eedacebd583dd9bb8f88c5a6c99
3
+ size 2999875632
model-00119-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f42052b3b899901dbee4787feb8798cea18f4c639ec4d8ea830f1c0bfee44380
3
+ size 2994213384
model-00123-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:955a63f081dd7b56738b34f84aedc035a8e0544d46886a3890a96e9b3135495f
3
+ size 2999875224
model-00124-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dbc3e9591eb95fd53928f95d54a3af613b984536b2efbf05bd3ddd85a149abb2
3
+ size 2994214024
model-00128-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f4ed3f8cbafaeb665035c4ad01427852eab7a8bf8a33807d4d280b4799be18d0
3
+ size 2994213536
model-00129-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:230a896720f5b8fe22973431080a2ad26927eb2e67c63989dd444151e5a8d57a
3
+ size 2999875736
model-00130-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2173c8d9e91dcff2a64dd5b277c933b5393007637aaa93ad9c31fb164d9c8989
3
+ size 2999532328
model-00131-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:67fb5585c3832df24d54d468d61113589b40d94bc146758f566519abf4f71e3e
3
+ size 2994556920
model-00132-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:88d2cfc12c0ae16995b18064ed23ced1c34a7596a7c283b39da0c76ba33df97e
3
+ size 2999875216
model-00133-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:31b0d6ab754aa92f9764649856a06c3fefc09450da6d9a346e1f194f49090b71
3
+ size 2994214048
model-00135-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ac988353923bbe074747530ccb2ec0cb536f5e2925e1f98e95c9f26b1bbdf695
3
+ size 2994213912
model-00138-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:59c3ecf3e08fa09f4ef9b850db2b85b505245d241b88a9b37e2f1f48fb225a80
3
+ size 2999875608
model-00139-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5128360abce7b1276c47d47ffe56170f23625bf5c6f5b48fffb006577ea3ef59
3
+ size 2994213416
model-00141-of-00141.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4abf9afc9bd62201a4d0fbe0e8a393269d5e8e2658be83b1c4e5c8eb84d4127b
3
+ size 2054210192
model.safetensors.index.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2b154ef2838068217de8fa81b2f8d5242f2c33bfe7f6d057f0f72d46f53a458c
3
+ size 16093166
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:19e773648cb4e65de8660ea6365e10acca112d42a854923df93db4a6f333a82d
3
+ size 20217442