amd
/

GLM-5-NVFP4

+MIT License
+Copyright (c) 2026 Zhipu AI
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md CHANGED Viewed

@@ -11,9 +11,12 @@ base_model:
   - **Output:** Text
 - **Supported Hardware Microarchitecture:** AMD MI300/MI350/MI355 (emulation)
 - **ROCm:** 7.2.2
 - **Operating System(s):** Linux
 - **Inference Engine:** [vLLM](https://docs.vllm.ai/en/latest/)
 - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) (V0.12)
   - **Weight quantization:** MOE-only, NVFP4, Static
   - **Activation quantization:** MOE-only, NVFP4, Dynamic
 - **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup)
@@ -28,8 +31,8 @@ The model was quantized from [zai-org/GLM-5](https://huggingface.co/zai-org/GLM-
 sudo sysctl -w vm.max_map_count=4194304
 cd Quark/examples/torch/language_modeling/llm_ptq/
 export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
-export MODEL_DIR=/zai-org/GLM-5
-export output_dir=/amd/GLM-5-NVFP4
 exclude_layers="*self_attn* *mlp.gate *lm_head *mlp.gate_proj *mlp.up_proj *mlp.down_proj"
 python3 quantize_quark.py --model_dir $MODEL_DIR \
                           --quant_scheme nvfp4 \
@@ -93,7 +96,7 @@ export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
 export PYTORCH_ALLOC_CONF=expandable_segments:True
 lm_eval \
   --model vllm \
-  --model_args pretrained=/amd/GLM-5-NVFP4,tensor_parallel_size=8,max_model_len=4096,gpu_memory_utilization=0.90,enforce_eager=True,max_gen_toks=2048,kv_cache_dtype=bfloat16,trust_remote_code=True \
   --tasks gsm8k \
   --num_fewshot 5 \
   --batch_size auto
@@ -103,4 +106,4 @@ lm_eval \
 # License
-Modifications Copyright(c) 2026 Advanced Micro Devices, Inc. All rights reserved.

   - **Output:** Text
 - **Supported Hardware Microarchitecture:** AMD MI300/MI350/MI355 (emulation)
 - **ROCm:** 7.2.2
+- **PyTorch**: 2.10.0
+- **Transformers**: 5.2.0
 - **Operating System(s):** Linux
 - **Inference Engine:** [vLLM](https://docs.vllm.ai/en/latest/)
 - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) (V0.12)
+  - **Quantized layers:** `experts` and `shared_experts`
   - **Weight quantization:** MOE-only, NVFP4, Static
   - **Activation quantization:** MOE-only, NVFP4, Dynamic
 - **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup)
 sudo sysctl -w vm.max_map_count=4194304
 cd Quark/examples/torch/language_modeling/llm_ptq/
 export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
+export MODEL_DIR=zai-org/GLM-5
+export output_dir=amd/GLM-5-NVFP4
 exclude_layers="*self_attn* *mlp.gate *lm_head *mlp.gate_proj *mlp.up_proj *mlp.down_proj"
 python3 quantize_quark.py --model_dir $MODEL_DIR \
                           --quant_scheme nvfp4 \
 export PYTORCH_ALLOC_CONF=expandable_segments:True
 lm_eval \
   --model vllm \
+  --model_args pretrained=amd/GLM-5-NVFP4,tensor_parallel_size=8,max_model_len=4096,gpu_memory_utilization=0.90,enforce_eager=True,max_gen_toks=2048,kv_cache_dtype=bfloat16,trust_remote_code=True \
   --tasks gsm8k \
   --num_fewshot 5 \
   --batch_size auto
 # License
+Modifications Copyright(c) 2026 Advanced Micro Devices, Inc. All rights reserved.