QuantTrio
/

GLM-4.6-GPTQ-Int4-Int8Mix

Text Generation

4-bit precision

Model card Files Files and versions

tclf90 commited on Oct 3, 2025

Commit

b4f3c57

·

verified ·

1 Parent(s): 318fee0

Update README.md

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -6,11 +6,11 @@ tags:
 - GPTQ
 - vLLM
 base_model:
-- ZhipuAI/GLM-4.6
 base_model_relation: quantized
 ---
 # GLM-4.6-GPTQ-Int4-Int8Mix
-Base Model: [ZhipuAI/GLM-4.6](https://www.modelscope.cn/models/ZhipuAI/GLM-4.6)
 ### 【Dependencies / Installation】
 As of **2025-10-01**, create a fresh Python environment and run:
@@ -26,7 +26,7 @@ otherwise the expert tensors couldn’t be evenly sharded across GPU devices.</i
 ```
 CONTEXT_LENGTH=32768
 vllm serve \
-    tclf90/GLM-4.6-GPTQ-Int4-Int8Mix \
     --served-model-name My_Model \
     --enable-auto-tool-choice \
     --tool-call-parser glm45 \
@@ -57,7 +57,7 @@ vllm serve \
 ### 【Model Download】
 ```python
 from modelscope import snapshot_download
-snapshot_download('tclf90/GLM-4.6-GPTQ-Int4-Int8Mix', cache_dir="your_local_path")
 ```
 ### 【Overview】

 - GPTQ
 - vLLM
 base_model:
+- zai-org/GLM-4.6
 base_model_relation: quantized
 ---
 # GLM-4.6-GPTQ-Int4-Int8Mix
+Base Model: [zai-org/GLM-4.6](https://huggingface.co/zai-org/GLM-4.6)
 ### 【Dependencies / Installation】
 As of **2025-10-01**, create a fresh Python environment and run:
 ```
 CONTEXT_LENGTH=32768
 vllm serve \
+    QuantTrio/GLM-4.6-GPTQ-Int4-Int8Mix \
     --served-model-name My_Model \
     --enable-auto-tool-choice \
     --tool-call-parser glm45 \
 ### 【Model Download】
 ```python
 from modelscope import snapshot_download
+snapshot_download('QuantTrio/GLM-4.6-GPTQ-Int4-Int8Mix', cache_dir="your_local_path")
 ```
 ### 【Overview】