Instructions to use unsloth/GLM-5.2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use unsloth/GLM-5.2 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="unsloth/GLM-5.2") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForMultimodalLM tokenizer = AutoTokenizer.from_pretrained("unsloth/GLM-5.2") model = AutoModelForMultimodalLM.from_pretrained("unsloth/GLM-5.2") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use unsloth/GLM-5.2 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "unsloth/GLM-5.2" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "unsloth/GLM-5.2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/unsloth/GLM-5.2
- SGLang
How to use unsloth/GLM-5.2 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "unsloth/GLM-5.2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "unsloth/GLM-5.2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "unsloth/GLM-5.2" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "unsloth/GLM-5.2", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio
How to use unsloth/GLM-5.2 with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for unsloth/GLM-5.2 to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for unsloth/GLM-5.2 to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for unsloth/GLM-5.2 to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="unsloth/GLM-5.2", max_seq_length=2048, ) - Docker Model Runner
How to use unsloth/GLM-5.2 with Docker Model Runner:
docker model run hf.co/unsloth/GLM-5.2
Upload folder using huggingface_hub
Browse filesThis view is limited to 50 files because it contains too many changes. See raw diff
- .gitattributes +1 -0
- LICENSE +21 -0
- README.md +112 -0
- chat_template.jinja +119 -0
- config.json +306 -0
- generation_config.json +12 -0
- model-00001-of-00282.safetensors +3 -0
- model-00002-of-00282.safetensors +3 -0
- model-00003-of-00282.safetensors +3 -0
- model-00004-of-00282.safetensors +3 -0
- model-00005-of-00282.safetensors +3 -0
- model-00006-of-00282.safetensors +3 -0
- model-00007-of-00282.safetensors +3 -0
- model-00008-of-00282.safetensors +3 -0
- model-00009-of-00282.safetensors +3 -0
- model-00010-of-00282.safetensors +3 -0
- model-00011-of-00282.safetensors +3 -0
- model-00012-of-00282.safetensors +3 -0
- model-00013-of-00282.safetensors +3 -0
- model-00014-of-00282.safetensors +3 -0
- model-00015-of-00282.safetensors +3 -0
- model-00016-of-00282.safetensors +3 -0
- model-00017-of-00282.safetensors +3 -0
- model-00018-of-00282.safetensors +3 -0
- model-00019-of-00282.safetensors +3 -0
- model-00020-of-00282.safetensors +3 -0
- model-00021-of-00282.safetensors +3 -0
- model-00022-of-00282.safetensors +3 -0
- model-00023-of-00282.safetensors +3 -0
- model-00024-of-00282.safetensors +3 -0
- model-00025-of-00282.safetensors +3 -0
- model-00026-of-00282.safetensors +3 -0
- model-00027-of-00282.safetensors +3 -0
- model-00028-of-00282.safetensors +3 -0
- model-00029-of-00282.safetensors +3 -0
- model-00030-of-00282.safetensors +3 -0
- model-00031-of-00282.safetensors +3 -0
- model-00032-of-00282.safetensors +3 -0
- model-00033-of-00282.safetensors +3 -0
- model-00034-of-00282.safetensors +3 -0
- model-00035-of-00282.safetensors +3 -0
- model-00036-of-00282.safetensors +3 -0
- model-00037-of-00282.safetensors +3 -0
- model-00038-of-00282.safetensors +3 -0
- model-00039-of-00282.safetensors +3 -0
- model-00040-of-00282.safetensors +3 -0
- model-00041-of-00282.safetensors +3 -0
- model-00042-of-00282.safetensors +3 -0
- model-00043-of-00282.safetensors +3 -0
- model-00044-of-00282.safetensors +3 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
tokenizer.json filter=lfs diff=lfs merge=lfs -text
|
LICENSE
ADDED
|
@@ -0,0 +1,21 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
MIT License
|
| 2 |
+
|
| 3 |
+
Copyright (c) 2026 Zhipu AI
|
| 4 |
+
|
| 5 |
+
Permission is hereby granted, free of charge, to any person obtaining a copy
|
| 6 |
+
of this software and associated documentation files (the "Software"), to deal
|
| 7 |
+
in the Software without restriction, including without limitation the rights
|
| 8 |
+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
|
| 9 |
+
copies of the Software, and to permit persons to whom the Software is
|
| 10 |
+
furnished to do so, subject to the following conditions:
|
| 11 |
+
|
| 12 |
+
The above copyright notice and this permission notice shall be included in all
|
| 13 |
+
copies or substantial portions of the Software.
|
| 14 |
+
|
| 15 |
+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
|
| 16 |
+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
|
| 17 |
+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
|
| 18 |
+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
|
| 19 |
+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
|
| 20 |
+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
|
| 21 |
+
SOFTWARE.
|
README.md
ADDED
|
@@ -0,0 +1,112 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
tags:
|
| 3 |
+
- unsloth
|
| 4 |
+
base_model:
|
| 5 |
+
- zai-org/GLM-5.2
|
| 6 |
+
language:
|
| 7 |
+
- en
|
| 8 |
+
- zh
|
| 9 |
+
library_name: transformers
|
| 10 |
+
license: mit
|
| 11 |
+
pipeline_tag: text-generation
|
| 12 |
+
---
|
| 13 |
+
<div>
|
| 14 |
+
<p style="margin-top: 0;margin-bottom: 0;">
|
| 15 |
+
<em><a href="https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0</a> achieves superior accuracy & outperforms other leading quants.</em>
|
| 16 |
+
</p>
|
| 17 |
+
<div style="display: flex; gap: 5px; align-items: center; ">
|
| 18 |
+
<a href="https://github.com/unslothai/unsloth/">
|
| 19 |
+
<img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="133">
|
| 20 |
+
</a>
|
| 21 |
+
<a href="https://discord.gg/unsloth">
|
| 22 |
+
<img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173">
|
| 23 |
+
</a>
|
| 24 |
+
<a href="https://docs.unsloth.ai/">
|
| 25 |
+
<img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
|
| 26 |
+
</a>
|
| 27 |
+
</div>
|
| 28 |
+
</div>
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
# GLM-5.2
|
| 32 |
+
|
| 33 |
+
<div align="center">
|
| 34 |
+
<img src=https://raw.githubusercontent.com/zai-org/GLM-5/refs/heads/main/resources/logo.svg width="15%"/>
|
| 35 |
+
</div>
|
| 36 |
+
<p align="center">
|
| 37 |
+
👋 Join our <a href="https://raw.githubusercontent.com/zai-org/GLM-5/refs/heads/main/resources/wechat.png" target="_blank">WeChat</a> or <a href="https://discord.gg/QR7SARHRxK" target="_blank">Discord</a> community.
|
| 38 |
+
<br>
|
| 39 |
+
📖 Check out the GLM-5.2 <a href="https://z.ai/blog/glm-5.2" target="_blank">blog</a> and GLM-5 <a href="https://arxiv.org/abs/2602.15763" target="_blank">Technical report</a>.
|
| 40 |
+
<br>
|
| 41 |
+
📍 Use GLM-5.2 API services on <a href="https://docs.z.ai/guides/llm/glm-5.2">Z.ai API Platform. </a>
|
| 42 |
+
<br>
|
| 43 |
+
🔜 Try GLM-5.2 <a href="https://chat.z.ai">here</a>.
|
| 44 |
+
</p>
|
| 45 |
+
|
| 46 |
+
<p align="center">
|
| 47 |
+
[<a href="https://huggingface.co/papers/2602.15763" target="_blank">Paper</a>]
|
| 48 |
+
[<a href="https://github.com/zai-org/GLM-5" target="_blank">GitHub</a>]
|
| 49 |
+
</p>
|
| 50 |
+
|
| 51 |
+
## Introduction
|
| 52 |
+
|
| 53 |
+
We're introducing GLM-5.2, our latest flagship model for long-horizon tasks. It marks a substantial leap in long-horizon task capability over its predecessor GLM-5.1 and, for the first time, delivers that capability on a **solid 1M-token context**. GLM-5.2's new capabilities include:
|
| 54 |
+
- **Solid 1M Context:** A solid 1M-token context that stably sustains long-horizon work
|
| 55 |
+
- **Advanced Coding with Flexible Effort**: Stronger coding capabilities with multiple thinking effort levels to balance performance and latency
|
| 56 |
+
- **Improved Architecture**: We propose [IndexShare](https://arxiv.org/abs/2603.12201), which reuses the same indexer across every four sparse attention layers, reducing per-token FLOPs by 2.9× at a 1M context length. We also improve GLM-5.2’s MTP layer for speculative decoding, increasing the acceptance length by up to 20%
|
| 57 |
+
- **Pure Open**: An MIT open-source license — no regional limits, technical access without borders
|
| 58 |
+
|
| 59 |
+

|
| 60 |
+
|
| 61 |
+
## Benchmark
|
| 62 |
+
|
| 63 |
+
|Benchmark|GLM-5.2|GLM-5.1|Qwen3.7-Max|MiniMax M3|DeepSeek-V4-Pro|Claude Opus 4.8|GPT-5.5|Gemini 3.1 Pro|
|
| 64 |
+
|:---|:---:|:---:|:---:|:---:|:---:|:---:|:---:|:---:|
|
| 65 |
+
|Reasoning|||||||||||
|
| 66 |
+
|HLE|40.5|31|41.4|37|37.7|49.8*|41.4*|45|
|
| 67 |
+
|HLE (w/ Tools)|54.7|52.3|53.5|-|48.2|57.9*|52.2*|51.4*|
|
| 68 |
+
|CritPt|16.7|4.6|13.4|3.7|12.9|20.9|27.1|17.7|
|
| 69 |
+
|AIME 2026|99.2|95.3|97|-|94.6|95.7|98.3|98.2|
|
| 70 |
+
|HMMT Nov. 2025|94.4|94|95|84.4|94.4|96.5|96.5|94.8|
|
| 71 |
+
|HMMT Feb. 2026|92.5|82.6|97.1|84.4|95.2|96.7|96.7|87.3|
|
| 72 |
+
|IMOAnswerBench|91.0|83.8|90|-|89.8|83.5|-|81|
|
| 73 |
+
|GPQA-Diamond|91.2|86.2|90|93|90.1|93.6|93.6|94.3|
|
| 74 |
+
|Coding|||||||||||
|
| 75 |
+
|SWE-bench Pro|62.1|58.4|60.6|59|55.4|69.2|58.6|54.2|
|
| 76 |
+
|NL2Repo|48.9|42.7|47.2|42.1|35.5|69.7|50.7|33.4|
|
| 77 |
+
|DeepSWE|46.2|18|18|20|8|58|70|10|
|
| 78 |
+
|ProgramBench|63.7|50.9|-|-|47.8|71.9|70.8|39.5|
|
| 79 |
+
|Terminal Bench 2.1 (Terminus-2)|81.0|63.5|75|65|64| 85|84|74|
|
| 80 |
+
|Terminal Bench 2.1 (Best Reported Harness)|82.7|69|-|-|-|78.9|83.4|70.7|
|
| 81 |
+
|FrontierSWE (Dominance)|74.4|30.5|-|-|29.0|75.1|72.6|39.6|
|
| 82 |
+
|PostTrainBench|34.3|20.1|-|-|-|37.2|28.4|21.6|
|
| 83 |
+
|SWE-Marathon|13.0|1.0|-|-|-|26.0|12.0|4.0|
|
| 84 |
+
|Agentic|||||||||||
|
| 85 |
+
|MCP-Atlas (Public Set)|76.8|71.8|76.4|74.2|73.6|77.8|75.3|69.2|
|
| 86 |
+
|Tool-Decathlon|48.2|40.7|-|-|52.8|59.9|55.6|48.8|
|
| 87 |
+
|
| 88 |
+
## Serve GLM-5.2 Locally
|
| 89 |
+
|
| 90 |
+
The following open-source frameworks support local deployment of GLM-5.2:
|
| 91 |
+
|
| 92 |
+
- [SGLang](https://github.com/sgl-project/sglang) (v0.5.13.post1+) — see [cookbook](https://cookbook.sglang.io/autoregressive/GLM/GLM-5.2)
|
| 93 |
+
- [vLLM](https://github.com/vllm-project/vllm) (v0.23.0+) — see [recipes](https://github.com/vllm-project/recipes/blob/main/GLM/GLM5.md)
|
| 94 |
+
- [xLLM](https://github.com/jd-opensource/xllm) (v0.10.0+) — see [example](https://github.com/zai-org/GLM-5/blob/main/example/ascend.md)
|
| 95 |
+
- [Transformers](https://github.com/huggingface/transformers) (v0.5.12+) — see [transformers docs](https://github.com/huggingface/transformers/blob/main/docs/source/en/model_doc/glm_moe_dsa.md)
|
| 96 |
+
- [KTransformers](https://github.com/kvcache-ai/ktransformers) (v0.5.12+) — see [tutorial](https://github.com/kvcache-ai/ktransformers/blob/main/doc/en/kt-kernel/GLM-5.2-Tutorial.md)
|
| 97 |
+
|
| 98 |
+
## Citation
|
| 99 |
+
|
| 100 |
+
If you find GLM-5.2 useful in your research, please cite our technical report:
|
| 101 |
+
|
| 102 |
+
```bibtex
|
| 103 |
+
@misc{glm5team2026glm5vibecodingagentic,
|
| 104 |
+
title={GLM-5: from Vibe Coding to Agentic Engineering},
|
| 105 |
+
author={GLM-5-Team and : and Aohan Zeng and Xin Lv and Zhenyu Hou and Zhengxiao Du and Qinkai Zheng and Bin Chen and Da Yin and Chendi Ge and Chenghua Huang and Chengxing Xie and Chenzheng Zhu and Congfeng Yin and Cunxiang Wang and Gengzheng Pan and Hao Zeng and Haoke Zhang and Haoran Wang and Huilong Chen and Jiajie Zhang and Jian Jiao and Jiaqi Guo and Jingsen Wang and Jingzhao Du and Jinzhu Wu and Kedong Wang and Lei Li and Lin Fan and Lucen Zhong and Mingdao Liu and Mingming Zhao and Pengfan Du and Qian Dong and Rui Lu and Shuang-Li and Shulin Cao and Song Liu and Ting Jiang and Xiaodong Chen and Xiaohan Zhang and Xuancheng Huang and Xuezhen Dong and Yabo Xu and Yao Wei and Yifan An and Yilin Niu and Yitong Zhu and Yuanhao Wen and Yukuo Cen and Yushi Bai and Zhongpei Qiao and Zihan Wang and Zikang Wang and Zilin Zhu and Ziqiang Liu and Zixuan Li and Bojie Wang and Bosi Wen and Can Huang and Changpeng Cai and Chao Yu and Chen Li and Chengwei Hu and Chenhui Zhang and Dan Zhang and Daoyan Lin and Dayong Yang and Di Wang and Ding Ai and Erle Zhu and Fangzhou Yi and Feiyu Chen and Guohong Wen and Hailong Sun and Haisha Zhao and Haiyi Hu and Hanchen Zhang and Hanrui Liu and Hanyu Zhang and Hao Peng and Hao Tai and Haobo Zhang and He Liu and Hongwei Wang and Hongxi Yan and Hongyu Ge and Huan Liu and Huanpeng Chu and Jia'ni Zhao and Jiachen Wang and Jiajing Zhao and Jiamin Ren and Jiapeng Wang and Jiaxin Zhang and Jiayi Gui and Jiayue Zhao and Jijie Li and Jing An and Jing Li and Jingwei Yuan and Jinhua Du and Jinxin Liu and Junkai Zhi and Junwen Duan and Kaiyue Zhou and Kangjian Wei and Ke Wang and Keyun Luo and Laiqiang Zhang and Leigang Sha and Liang Xu and Lindong Wu and Lintao Ding and Lu Chen and Minghao Li and Nianyi Lin and Pan Ta and Qiang Zou and Rongjun Song and Ruiqi Yang and Shangqing Tu and Shangtong Yang and Shaoxiang Wu and Shengyan Zhang and Shijie Li and Shuang Li and Shuyi Fan and Wei Qin and Wei Tian and Weining Zhang and Wenbo Yu and Wenjie Liang and Xiang Kuang and Xiangmeng Cheng and Xiangyang Li and Xiaoquan Yan and Xiaowei Hu and Xiaoying Ling and Xing Fan and Xingye Xia and Xinyuan Zhang and Xinze Zhang and Xirui Pan and Xu Zou and Xunkai Zhang and Yadi Liu and Yandong Wu and Yanfu Li and Yidong Wang and Yifan Zhu and Yijun Tan and Yilin Zhou and Yiming Pan and Ying Zhang and Yinpei Su and Yipeng Geng and Yong Yan and Yonglin Tan and Yuean Bi and Yuhan Shen and Yuhao Yang and Yujiang Li and Yunan Liu and Yunqing Wang and Yuntao Li and Yurong Wu and Yutao Zhang and Yuxi Duan and Yuxuan Zhang and Zezhen Liu and Zhengtao Jiang and Zhenhe Yan and Zheyu Zhang and Zhixiang Wei and Zhuo Chen and Zhuoer Feng and Zijun Yao and Ziwei Chai and Ziyuan Wang and Zuzhou Zhang and Bin Xu and Minlie Huang and Hongning Wang and Juanzi Li and Yuxiao Dong and Jie Tang},
|
| 106 |
+
year={2026},
|
| 107 |
+
eprint={2602.15763},
|
| 108 |
+
archivePrefix={arXiv},
|
| 109 |
+
primaryClass={cs.LG},
|
| 110 |
+
url={https://arxiv.org/abs/2602.15763},
|
| 111 |
+
}
|
| 112 |
+
```
|
chat_template.jinja
ADDED
|
@@ -0,0 +1,119 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
[gMASK]<sop>
|
| 2 |
+
{%- set effective_reasoning_effort = 'high' if reasoning_effort is defined and reasoning_effort == 'high' else 'max' -%}
|
| 3 |
+
{%- if (enable_thinking is not defined or enable_thinking) and effective_reasoning_effort is not none -%}<|system|>Reasoning Effort: {{ effective_reasoning_effort | capitalize }}{%- endif -%}
|
| 4 |
+
{%- if tools -%}
|
| 5 |
+
{%- macro tool_to_json(tool) -%}
|
| 6 |
+
{%- set ns_tool = namespace(first=true) -%}
|
| 7 |
+
{{ '{' -}}
|
| 8 |
+
{%- for k, v in tool.items() -%}
|
| 9 |
+
{%- if k != 'defer_loading' and k != 'strict' -%}
|
| 10 |
+
{%- if not ns_tool.first -%}{{- ', ' -}}{%- endif -%}
|
| 11 |
+
{%- set ns_tool.first = false -%}
|
| 12 |
+
"{{ k }}": {{ v | tojson(ensure_ascii=False) }}
|
| 13 |
+
{%- endif -%}
|
| 14 |
+
{%- endfor -%}
|
| 15 |
+
{{- '}' -}}
|
| 16 |
+
{%- endmacro -%}
|
| 17 |
+
<|system|>
|
| 18 |
+
# Tools
|
| 19 |
+
|
| 20 |
+
You may call one or more functions to assist with the user query.
|
| 21 |
+
|
| 22 |
+
You are provided with function signatures within <tools></tools> XML tags:
|
| 23 |
+
<tools>
|
| 24 |
+
{% for tool in tools %}
|
| 25 |
+
{%- if tool is not none and tool is mapping and 'function' in tool -%}
|
| 26 |
+
{%- set tool = tool['function'] -%}
|
| 27 |
+
{%- endif -%}
|
| 28 |
+
{% if tool.defer_loading is not defined or not tool.defer_loading %}
|
| 29 |
+
{{ tool_to_json(tool) }}
|
| 30 |
+
{% endif %}
|
| 31 |
+
{% endfor %}
|
| 32 |
+
</tools>
|
| 33 |
+
|
| 34 |
+
For each function call, output the function name and arguments within the following XML format:
|
| 35 |
+
<tool_call>{function-name}<arg_key>{arg-key-1}</arg_key><arg_value>{arg-value-1}</arg_value><arg_key>{arg-key-2}</arg_key><arg_value>{arg-value-2}</arg_value>...</tool_call>{%- endif -%}
|
| 36 |
+
{%- macro visible_text(content) -%}
|
| 37 |
+
{%- if content is string -%}
|
| 38 |
+
{{- content }}
|
| 39 |
+
{%- elif content is iterable and content is not mapping -%}
|
| 40 |
+
{%- for item in content -%}
|
| 41 |
+
{%- if item is mapping and item.type == 'text' -%}
|
| 42 |
+
{{- item.text }}
|
| 43 |
+
{%- elif item is string -%}
|
| 44 |
+
{{- item }}
|
| 45 |
+
{%- elif item is mapping and item.type in ['image', 'image_url', 'video', 'video_url', 'audio', 'audio_url', 'input_audio'] -%}
|
| 46 |
+
{%- set media_type = item.type | replace('_url', '') | replace('input_', '') -%}
|
| 47 |
+
{{- "<reminder>You are unable to process this " ~ media_type ~ " because you don't have multi-modal input ability. Try different methods.</reminder>" }}
|
| 48 |
+
{%- endif -%}
|
| 49 |
+
{%- endfor -%}
|
| 50 |
+
{%- else -%}
|
| 51 |
+
{{- content }}
|
| 52 |
+
{%- endif -%}
|
| 53 |
+
{%- endmacro -%}
|
| 54 |
+
{%- set ns = namespace(last_user_index=-1) -%}
|
| 55 |
+
{%- for m in messages %}
|
| 56 |
+
{%- if m is not none and m is mapping and m.role == 'user' %}
|
| 57 |
+
{%- set ns.last_user_index = loop.index0 -%}
|
| 58 |
+
{%- endif %}
|
| 59 |
+
{%- endfor %}
|
| 60 |
+
{%- for m in messages -%}
|
| 61 |
+
{%- if m is not none and m is mapping and m.role == 'user' -%}<|user|>{{ visible_text(m.content) }}
|
| 62 |
+
{%- elif m.role == 'assistant' -%}
|
| 63 |
+
<|assistant|>
|
| 64 |
+
{%- set content = visible_text(m.content) %}
|
| 65 |
+
{%- if m.reasoning_content is string %}
|
| 66 |
+
{%- set reasoning_content = m.reasoning_content %}
|
| 67 |
+
{%- elif '</think>' in content %}
|
| 68 |
+
{%- set reasoning_content = content.split('</think>')[0].split('<think>')[-1] %}
|
| 69 |
+
{%- set content = content.split('</think>')[-1] %}
|
| 70 |
+
{%- endif %}
|
| 71 |
+
{%- if ((clear_thinking is defined and not clear_thinking) or loop.index0 > ns.last_user_index) and reasoning_content is defined -%}
|
| 72 |
+
{{ '<think>' + reasoning_content + '</think>'}}
|
| 73 |
+
{%- else -%}
|
| 74 |
+
{{ '<think></think>' }}
|
| 75 |
+
{%- endif -%}
|
| 76 |
+
{%- if content.strip() -%}
|
| 77 |
+
{{ content.strip() }}
|
| 78 |
+
{%- endif -%}
|
| 79 |
+
{% if m.tool_calls %}
|
| 80 |
+
{% for tc in m.tool_calls %}
|
| 81 |
+
{%- if tc.function %}
|
| 82 |
+
{%- set tc = tc.function %}
|
| 83 |
+
{%- endif %}
|
| 84 |
+
{{- '<tool_call>' + tc.name -}}
|
| 85 |
+
{% set _args = tc.arguments %}{% for k, v in _args.items() %}<arg_key>{{ k }}</arg_key><arg_value>{{ v | tojson(ensure_ascii=False) if v is not string else v }}</arg_value>{% endfor %}</tool_call>{% endfor %}
|
| 86 |
+
{% endif %}
|
| 87 |
+
{%- elif m.role == 'tool' -%}
|
| 88 |
+
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
|
| 89 |
+
{{- '<|observation|>' -}}
|
| 90 |
+
{%- endif %}
|
| 91 |
+
{%- if m.content is string -%}
|
| 92 |
+
{{- '<tool_response>' + m.content + '</tool_response>' -}}
|
| 93 |
+
{%- elif m.content is iterable and m.content is not mapping and m.content and m.content.0.type == "tool_reference" -%}
|
| 94 |
+
{{- '<tool_response><tools>\n' -}}
|
| 95 |
+
{% for tr in m.content %}
|
| 96 |
+
{%- for tool in tools -%}
|
| 97 |
+
{%- if tool is not none and tool is mapping and 'function' in tool -%}
|
| 98 |
+
{%- set tool = tool['function'] -%}
|
| 99 |
+
{%- endif -%}
|
| 100 |
+
{%- if tool is not none and tool is mapping and tool.name == tr.name -%}
|
| 101 |
+
{{- tool_to_json(tool) + '\n' -}}
|
| 102 |
+
{%- endif -%}
|
| 103 |
+
{%- endfor -%}
|
| 104 |
+
{%- endfor -%}
|
| 105 |
+
{{- '</tools></tool_response>' -}}
|
| 106 |
+
{%- elif m.content is iterable and m.content is not mapping and m.content and m.content.0 is mapping and m.content.0.output is defined -%}
|
| 107 |
+
{%- for tr in m.content -%}
|
| 108 |
+
{{- '<tool_response>' + tr.output + '</tool_response>' -}}
|
| 109 |
+
{%- endfor -%}
|
| 110 |
+
{%- else -%}
|
| 111 |
+
{{- '<tool_response>' + visible_text(m.content) + '</tool_response>' -}}
|
| 112 |
+
{% endif -%}
|
| 113 |
+
{%- elif m.role == 'system' -%}
|
| 114 |
+
<|system|>{{ visible_text(m.content) }}
|
| 115 |
+
{%- endif -%}
|
| 116 |
+
{%- endfor -%}
|
| 117 |
+
{%- if add_generation_prompt -%}
|
| 118 |
+
<|assistant|>{{- '<think></think>' if (enable_thinking is defined and not enable_thinking) else '<think>' -}}
|
| 119 |
+
{%- endif -%}
|
config.json
ADDED
|
@@ -0,0 +1,306 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"architectures": [
|
| 3 |
+
"GlmMoeDsaForCausalLM"
|
| 4 |
+
],
|
| 5 |
+
"attention_bias": false,
|
| 6 |
+
"attention_dropout": 0.0,
|
| 7 |
+
"torch_dtype": "bfloat16",
|
| 8 |
+
"eos_token_id": [
|
| 9 |
+
154820,
|
| 10 |
+
154827,
|
| 11 |
+
154829
|
| 12 |
+
],
|
| 13 |
+
"ep_size": 1,
|
| 14 |
+
"first_k_dense_replace": 3,
|
| 15 |
+
"head_dim": 64,
|
| 16 |
+
"hidden_act": "silu",
|
| 17 |
+
"hidden_size": 6144,
|
| 18 |
+
"index_head_dim": 128,
|
| 19 |
+
"index_n_heads": 32,
|
| 20 |
+
"index_share_for_mtp_iteration": true,
|
| 21 |
+
"index_skip_topk_offset": 3,
|
| 22 |
+
"index_topk": 2048,
|
| 23 |
+
"index_topk_freq": 4,
|
| 24 |
+
"index_topk_pattern": null,
|
| 25 |
+
"indexer_rope_interleave": true,
|
| 26 |
+
"indexer_types": [
|
| 27 |
+
"full",
|
| 28 |
+
"full",
|
| 29 |
+
"full",
|
| 30 |
+
"shared",
|
| 31 |
+
"shared",
|
| 32 |
+
"shared",
|
| 33 |
+
"full",
|
| 34 |
+
"shared",
|
| 35 |
+
"shared",
|
| 36 |
+
"shared",
|
| 37 |
+
"full",
|
| 38 |
+
"shared",
|
| 39 |
+
"shared",
|
| 40 |
+
"shared",
|
| 41 |
+
"full",
|
| 42 |
+
"shared",
|
| 43 |
+
"shared",
|
| 44 |
+
"shared",
|
| 45 |
+
"full",
|
| 46 |
+
"shared",
|
| 47 |
+
"shared",
|
| 48 |
+
"shared",
|
| 49 |
+
"full",
|
| 50 |
+
"shared",
|
| 51 |
+
"shared",
|
| 52 |
+
"shared",
|
| 53 |
+
"full",
|
| 54 |
+
"shared",
|
| 55 |
+
"shared",
|
| 56 |
+
"shared",
|
| 57 |
+
"full",
|
| 58 |
+
"shared",
|
| 59 |
+
"shared",
|
| 60 |
+
"shared",
|
| 61 |
+
"full",
|
| 62 |
+
"shared",
|
| 63 |
+
"shared",
|
| 64 |
+
"shared",
|
| 65 |
+
"full",
|
| 66 |
+
"shared",
|
| 67 |
+
"shared",
|
| 68 |
+
"shared",
|
| 69 |
+
"full",
|
| 70 |
+
"shared",
|
| 71 |
+
"shared",
|
| 72 |
+
"shared",
|
| 73 |
+
"full",
|
| 74 |
+
"shared",
|
| 75 |
+
"shared",
|
| 76 |
+
"shared",
|
| 77 |
+
"full",
|
| 78 |
+
"shared",
|
| 79 |
+
"shared",
|
| 80 |
+
"shared",
|
| 81 |
+
"full",
|
| 82 |
+
"shared",
|
| 83 |
+
"shared",
|
| 84 |
+
"shared",
|
| 85 |
+
"full",
|
| 86 |
+
"shared",
|
| 87 |
+
"shared",
|
| 88 |
+
"shared",
|
| 89 |
+
"full",
|
| 90 |
+
"shared",
|
| 91 |
+
"shared",
|
| 92 |
+
"shared",
|
| 93 |
+
"full",
|
| 94 |
+
"shared",
|
| 95 |
+
"shared",
|
| 96 |
+
"shared",
|
| 97 |
+
"full",
|
| 98 |
+
"shared",
|
| 99 |
+
"shared",
|
| 100 |
+
"shared",
|
| 101 |
+
"full",
|
| 102 |
+
"shared",
|
| 103 |
+
"shared",
|
| 104 |
+
"shared"
|
| 105 |
+
],
|
| 106 |
+
"initializer_range": 0.02,
|
| 107 |
+
"intermediate_size": 12288,
|
| 108 |
+
"kv_lora_rank": 512,
|
| 109 |
+
"layer_types": [
|
| 110 |
+
"deepseek_sparse_attention",
|
| 111 |
+
"deepseek_sparse_attention",
|
| 112 |
+
"deepseek_sparse_attention",
|
| 113 |
+
"deepseek_sparse_attention",
|
| 114 |
+
"deepseek_sparse_attention",
|
| 115 |
+
"deepseek_sparse_attention",
|
| 116 |
+
"deepseek_sparse_attention",
|
| 117 |
+
"deepseek_sparse_attention",
|
| 118 |
+
"deepseek_sparse_attention",
|
| 119 |
+
"deepseek_sparse_attention",
|
| 120 |
+
"deepseek_sparse_attention",
|
| 121 |
+
"deepseek_sparse_attention",
|
| 122 |
+
"deepseek_sparse_attention",
|
| 123 |
+
"deepseek_sparse_attention",
|
| 124 |
+
"deepseek_sparse_attention",
|
| 125 |
+
"deepseek_sparse_attention",
|
| 126 |
+
"deepseek_sparse_attention",
|
| 127 |
+
"deepseek_sparse_attention",
|
| 128 |
+
"deepseek_sparse_attention",
|
| 129 |
+
"deepseek_sparse_attention",
|
| 130 |
+
"deepseek_sparse_attention",
|
| 131 |
+
"deepseek_sparse_attention",
|
| 132 |
+
"deepseek_sparse_attention",
|
| 133 |
+
"deepseek_sparse_attention",
|
| 134 |
+
"deepseek_sparse_attention",
|
| 135 |
+
"deepseek_sparse_attention",
|
| 136 |
+
"deepseek_sparse_attention",
|
| 137 |
+
"deepseek_sparse_attention",
|
| 138 |
+
"deepseek_sparse_attention",
|
| 139 |
+
"deepseek_sparse_attention",
|
| 140 |
+
"deepseek_sparse_attention",
|
| 141 |
+
"deepseek_sparse_attention",
|
| 142 |
+
"deepseek_sparse_attention",
|
| 143 |
+
"deepseek_sparse_attention",
|
| 144 |
+
"deepseek_sparse_attention",
|
| 145 |
+
"deepseek_sparse_attention",
|
| 146 |
+
"deepseek_sparse_attention",
|
| 147 |
+
"deepseek_sparse_attention",
|
| 148 |
+
"deepseek_sparse_attention",
|
| 149 |
+
"deepseek_sparse_attention",
|
| 150 |
+
"deepseek_sparse_attention",
|
| 151 |
+
"deepseek_sparse_attention",
|
| 152 |
+
"deepseek_sparse_attention",
|
| 153 |
+
"deepseek_sparse_attention",
|
| 154 |
+
"deepseek_sparse_attention",
|
| 155 |
+
"deepseek_sparse_attention",
|
| 156 |
+
"deepseek_sparse_attention",
|
| 157 |
+
"deepseek_sparse_attention",
|
| 158 |
+
"deepseek_sparse_attention",
|
| 159 |
+
"deepseek_sparse_attention",
|
| 160 |
+
"deepseek_sparse_attention",
|
| 161 |
+
"deepseek_sparse_attention",
|
| 162 |
+
"deepseek_sparse_attention",
|
| 163 |
+
"deepseek_sparse_attention",
|
| 164 |
+
"deepseek_sparse_attention",
|
| 165 |
+
"deepseek_sparse_attention",
|
| 166 |
+
"deepseek_sparse_attention",
|
| 167 |
+
"deepseek_sparse_attention",
|
| 168 |
+
"deepseek_sparse_attention",
|
| 169 |
+
"deepseek_sparse_attention",
|
| 170 |
+
"deepseek_sparse_attention",
|
| 171 |
+
"deepseek_sparse_attention",
|
| 172 |
+
"deepseek_sparse_attention",
|
| 173 |
+
"deepseek_sparse_attention",
|
| 174 |
+
"deepseek_sparse_attention",
|
| 175 |
+
"deepseek_sparse_attention",
|
| 176 |
+
"deepseek_sparse_attention",
|
| 177 |
+
"deepseek_sparse_attention",
|
| 178 |
+
"deepseek_sparse_attention",
|
| 179 |
+
"deepseek_sparse_attention",
|
| 180 |
+
"deepseek_sparse_attention",
|
| 181 |
+
"deepseek_sparse_attention",
|
| 182 |
+
"deepseek_sparse_attention",
|
| 183 |
+
"deepseek_sparse_attention",
|
| 184 |
+
"deepseek_sparse_attention",
|
| 185 |
+
"deepseek_sparse_attention",
|
| 186 |
+
"deepseek_sparse_attention",
|
| 187 |
+
"deepseek_sparse_attention"
|
| 188 |
+
],
|
| 189 |
+
"max_position_embeddings": 1048576,
|
| 190 |
+
"mlp_bias": false,
|
| 191 |
+
"mlp_layer_types": [
|
| 192 |
+
"dense",
|
| 193 |
+
"dense",
|
| 194 |
+
"dense",
|
| 195 |
+
"sparse",
|
| 196 |
+
"sparse",
|
| 197 |
+
"sparse",
|
| 198 |
+
"sparse",
|
| 199 |
+
"sparse",
|
| 200 |
+
"sparse",
|
| 201 |
+
"sparse",
|
| 202 |
+
"sparse",
|
| 203 |
+
"sparse",
|
| 204 |
+
"sparse",
|
| 205 |
+
"sparse",
|
| 206 |
+
"sparse",
|
| 207 |
+
"sparse",
|
| 208 |
+
"sparse",
|
| 209 |
+
"sparse",
|
| 210 |
+
"sparse",
|
| 211 |
+
"sparse",
|
| 212 |
+
"sparse",
|
| 213 |
+
"sparse",
|
| 214 |
+
"sparse",
|
| 215 |
+
"sparse",
|
| 216 |
+
"sparse",
|
| 217 |
+
"sparse",
|
| 218 |
+
"sparse",
|
| 219 |
+
"sparse",
|
| 220 |
+
"sparse",
|
| 221 |
+
"sparse",
|
| 222 |
+
"sparse",
|
| 223 |
+
"sparse",
|
| 224 |
+
"sparse",
|
| 225 |
+
"sparse",
|
| 226 |
+
"sparse",
|
| 227 |
+
"sparse",
|
| 228 |
+
"sparse",
|
| 229 |
+
"sparse",
|
| 230 |
+
"sparse",
|
| 231 |
+
"sparse",
|
| 232 |
+
"sparse",
|
| 233 |
+
"sparse",
|
| 234 |
+
"sparse",
|
| 235 |
+
"sparse",
|
| 236 |
+
"sparse",
|
| 237 |
+
"sparse",
|
| 238 |
+
"sparse",
|
| 239 |
+
"sparse",
|
| 240 |
+
"sparse",
|
| 241 |
+
"sparse",
|
| 242 |
+
"sparse",
|
| 243 |
+
"sparse",
|
| 244 |
+
"sparse",
|
| 245 |
+
"sparse",
|
| 246 |
+
"sparse",
|
| 247 |
+
"sparse",
|
| 248 |
+
"sparse",
|
| 249 |
+
"sparse",
|
| 250 |
+
"sparse",
|
| 251 |
+
"sparse",
|
| 252 |
+
"sparse",
|
| 253 |
+
"sparse",
|
| 254 |
+
"sparse",
|
| 255 |
+
"sparse",
|
| 256 |
+
"sparse",
|
| 257 |
+
"sparse",
|
| 258 |
+
"sparse",
|
| 259 |
+
"sparse",
|
| 260 |
+
"sparse",
|
| 261 |
+
"sparse",
|
| 262 |
+
"sparse",
|
| 263 |
+
"sparse",
|
| 264 |
+
"sparse",
|
| 265 |
+
"sparse",
|
| 266 |
+
"sparse",
|
| 267 |
+
"sparse",
|
| 268 |
+
"sparse",
|
| 269 |
+
"sparse"
|
| 270 |
+
],
|
| 271 |
+
"model_type": "glm_moe_dsa",
|
| 272 |
+
"moe_intermediate_size": 2048,
|
| 273 |
+
"moe_layer_freq": 1,
|
| 274 |
+
"n_group": 1,
|
| 275 |
+
"n_routed_experts": 256,
|
| 276 |
+
"n_shared_experts": 1,
|
| 277 |
+
"norm_topk_prob": true,
|
| 278 |
+
"num_attention_heads": 64,
|
| 279 |
+
"num_experts": 256,
|
| 280 |
+
"num_experts_per_tok": 8,
|
| 281 |
+
"num_hidden_layers": 78,
|
| 282 |
+
"num_key_value_heads": 64,
|
| 283 |
+
"num_nextn_predict_layers": 1,
|
| 284 |
+
"pad_token_id": 154821,
|
| 285 |
+
"pretraining_tp": 1,
|
| 286 |
+
"q_lora_rank": 2048,
|
| 287 |
+
"qk_head_dim": 256,
|
| 288 |
+
"qk_nope_head_dim": 192,
|
| 289 |
+
"qk_rope_head_dim": 64,
|
| 290 |
+
"rms_norm_eps": 1e-05,
|
| 291 |
+
"rope_interleave": true,
|
| 292 |
+
"rope_parameters": {
|
| 293 |
+
"rope_theta": 8000000,
|
| 294 |
+
"rope_type": "default"
|
| 295 |
+
},
|
| 296 |
+
"routed_scaling_factor": 2.5,
|
| 297 |
+
"scoring_func": "sigmoid",
|
| 298 |
+
"tie_word_embeddings": false,
|
| 299 |
+
"topk_group": 1,
|
| 300 |
+
"topk_method": "noaux_tc",
|
| 301 |
+
"transformers_version": "5.13.0.dev0",
|
| 302 |
+
"unsloth_fixed": true,
|
| 303 |
+
"use_cache": true,
|
| 304 |
+
"v_head_dim": 256,
|
| 305 |
+
"vocab_size": 154880
|
| 306 |
+
}
|
generation_config.json
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"_from_model_config": true,
|
| 3 |
+
"eos_token_id": [
|
| 4 |
+
154820,
|
| 5 |
+
154827,
|
| 6 |
+
154829
|
| 7 |
+
],
|
| 8 |
+
"pad_token_id": 154820,
|
| 9 |
+
"temperature": 1.0,
|
| 10 |
+
"top_p": 0.95,
|
| 11 |
+
"transformers_version": "5.12.0"
|
| 12 |
+
}
|
model-00001-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:004bf9404964da8ea71ea2d3ebf02148fa766b956bd4fca3f54b093e58a6a74c
|
| 3 |
+
size 5342821416
|
model-00002-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7682416ae5caa51698994d71d011bf308016f97bc33fe2f8187f37cd73b85e39
|
| 3 |
+
size 5351970840
|
model-00003-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3427fbf78d22254a65e9ea918ab724ce9dbb0f9231c7054fbb3b7a65e132d13c
|
| 3 |
+
size 5360347320
|
model-00004-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c5cc75d06e0f5a693a21142594dda258dc46eb46ae14e33ab66310a2fd664c46
|
| 3 |
+
size 5360347208
|
model-00005-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7e5461ed71639e7496b8f4ebb71e4faffec78fdf97eeea424f8b0d06f1747402
|
| 3 |
+
size 5359985352
|
model-00006-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:c4225900bd3fb7de29a27f0c613680a8d38c7571753560837878115c58f06715
|
| 3 |
+
size 5360347320
|
model-00007-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:ba48cc4205a98fd1f6db7b62941439a7497346e3cb8e77f9e77f04afc094ec03
|
| 3 |
+
size 5360347320
|
model-00008-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2f2bf12168f93157a568ef0a5e7e2e2ed107c4d6f08da234bdaccb20f69dfd50
|
| 3 |
+
size 5360347144
|
model-00009-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b98cbef0349b6ce166add44223d299c315898f3a62d8abb2cdd6e7e2ea87f99d
|
| 3 |
+
size 5366406944
|
model-00010-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:dd95904337bc2f7afb5815c4ec8723cadd7d2899c3ffae39c06b95c09add7088
|
| 3 |
+
size 5360347312
|
model-00011-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b25431bc8f47afa28e7c54e7dd5c602124ba6d29eb29c1ae85803f3a2a8f4c5c
|
| 3 |
+
size 5360347288
|
model-00012-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2b547be1d742f78230e1358808ad26b373ed6855017cb8230dc80f503d49a91e
|
| 3 |
+
size 5363507640
|
model-00013-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0d6007e4f7e0eb084782bf2cccdd5d2d143d41939b033439e9fe3525abe85b47
|
| 3 |
+
size 5363246488
|
model-00014-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6af3afe8e57aa38d4765ed60b3d9d3db17ac3b9edcb1221a9e3803c26186a070
|
| 3 |
+
size 5360347312
|
model-00015-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8973e20da1e7a417d7a5c1ca937c679411e15d30e6b1c681bb48e8b1b67ec48a
|
| 3 |
+
size 5360347224
|
model-00016-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d2933dc4032c2248ca122be4cb1432dd8dd553d9967a6a515a82485662e92f99
|
| 3 |
+
size 5366406864
|
model-00017-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5f4bdb0a9cc4bf038705a034734212a9185f0853c06d5a6142b0b3f72720a27d
|
| 3 |
+
size 5360347320
|
model-00018-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1fea4352bbe2a046bfbfb9abbf5eb765c731299f291e8cdd57727c64330dcc9c
|
| 3 |
+
size 5360347312
|
model-00019-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:30e6a82321012b751a20d8d4b0ec7789fbd0f7d9793363c3aca1cfb3bac63dd2
|
| 3 |
+
size 5360347152
|
model-00020-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2a0c14209e0e265b74560da40960bcebc851ee68c3448c66e23ab474205e61f9
|
| 3 |
+
size 5359985408
|
model-00021-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8d938250b91052108ed2eb412269968a321aa33781120079ab3178f9db441c0a
|
| 3 |
+
size 5360347320
|
model-00022-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b23391d705ced93248aee64102135073c3f06f26ea32ebc32fbbd48348b0b9bb
|
| 3 |
+
size 5360347288
|
model-00023-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:de0d99516f2784543e6a2c549965e7082546f39ef10add5f881f6e09111badae
|
| 3 |
+
size 5360347112
|
model-00024-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:622a574bb87ed480e8a7c6f5d97d96c2e746ebd9d65fd3300a5b5f06078dc630
|
| 3 |
+
size 5366407008
|
model-00025-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:9b2dd5f87ab021b5646e3fd76ac93bc92483e1c4f518acf9158c1ec2de5f95a8
|
| 3 |
+
size 5360347320
|
model-00026-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:e6c1e51ea49e1ba15100da8dda71ee2643a56ea658cfad244cf4d7ea890618a9
|
| 3 |
+
size 5360347232
|
model-00027-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0d420c6b292f345c2f584ab3fccdc9be60f96de30e8beae7249b0ff52d80c610
|
| 3 |
+
size 5366406856
|
model-00028-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4847b68a168e928e9a77ae9e9ad8478773618d55e4879bcdeb8466fb440022a6
|
| 3 |
+
size 5360347320
|
model-00029-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:64087e925924da3c8ac95bd50348dd0308b5f710dc47b79542a293f288849bcc
|
| 3 |
+
size 5360347320
|
model-00030-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bfd33908d0bf7c36e602d8eece93f146878b3298bfe4ec1b1cb29b0a4ec8f2c9
|
| 3 |
+
size 5360347160
|
model-00031-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:b6fd157724a0a1c675877cb56885a4cb0ca015d17e93ce075ece80aed7173584
|
| 3 |
+
size 5366406928
|
model-00032-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:87863a9a6f05c76d083145a77463e84c3868787a7bf4f2fcb1ef30d2dfec9e61
|
| 3 |
+
size 5360347320
|
model-00033-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:45dd4f6733c1bb4fb6aea8a4ee35301f5fe55b9809690f51e87558e4151eb7d2
|
| 3 |
+
size 5360347304
|
model-00034-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6d9f46cfdc634f6ffe0391fa781aeeafbaf23acd830bb0985ececf493d4dbc57
|
| 3 |
+
size 5360347112
|
model-00035-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bde40fe925bdee14397ed505ceb2a9f69c45f2439cee7019599c1b0dfaf53158
|
| 3 |
+
size 5359985464
|
model-00036-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:24d6eb21ea6edfb239c196de287d526c339998721524aaa150102ddce2a5738a
|
| 3 |
+
size 5360347320
|
model-00037-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:08673e260d26e835d4bcddb655f1cf4f8cc1600b06260a86ac92217beaee63c2
|
| 3 |
+
size 5360347240
|
model-00038-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3eb356ec94ec4112d3b0a7faaf2b3dc62eb0d958696713b5283376d4461bcc4d
|
| 3 |
+
size 5362896104
|
model-00039-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:3cfceae2a0d6638ad38091c7388c64b1abc5605f7b01362fdb3e0693fb18d7d2
|
| 3 |
+
size 5360347312
|
model-00040-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2718c2467b41bd999c66bc9ba710a5477413ad89997ce00f86ab094e28c157f0
|
| 3 |
+
size 5360347320
|
model-00041-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:5b487ae66099fc34d92fdcbfb094fd4586c3ca65f620c27e2ccfbbb53544280d
|
| 3 |
+
size 5360347200
|
model-00042-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:0e82fb6a23dac095364e89e0b6eb4dc9a24c8419dd3a33644e31ea92e7d2db0d
|
| 3 |
+
size 5366406888
|
model-00043-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8132bfcd7af3d325d68368880f34693fbfef371d613c93e6326c00f33abef751
|
| 3 |
+
size 5360347320
|
model-00044-of-00282.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:83a40255e2242ef6d1a4b2886b9fa7912d654df34edf171b1bda9f5c07702acd
|
| 3 |
+
size 5360347320
|