add model card and update tokenizer

#3
Files changed (2) hide show
  1. README.md +105 -5
  2. tokenizer.json +2 -2
README.md CHANGED
@@ -1,9 +1,109 @@
1
  ---
2
- license: mit
 
 
 
 
3
  ---
4
 
5
- **Disclaimer**
6
 
7
- This model is provided for research and evaluation purposes only.
8
- Quantization may introduce accuracy or behavioral differences compared to the original model.
9
- Users are responsible for validating the model in their own environments and complying with the original model license.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license: other
3
+ license_name: modified-mit
4
+ license_link: LICENSE
5
+ base_model:
6
+ - zai-org/GLM-5
7
  ---
8
 
9
+ # Model Overview
10
 
11
+ - **Model Architecture:** GLM-5
12
+ - **Input:** Text
13
+ - **Output:** Text
14
+ - **Supported Hardware Microarchitecture:** AMD MI300/MI350/MI355 (emulation)
15
+ - **ROCm:** 7.2.2
16
+ - **Operating System(s):** Linux
17
+ - **Inference Engine:** [vLLM](https://docs.vllm.ai/en/latest/)
18
+ - **Model Optimizer:** [AMD-Quark](https://quark.docs.amd.com/latest/index.html) (V0.12)
19
+ - **Weight quantization:** MOE-only, NVFP4, Static
20
+ - **Activation quantization:** MOE-only, NVFP4, Dynamic
21
+ - **Calibration Dataset:** [Pile](https://huggingface.co/datasets/mit-han-lab/pile-val-backup)
22
+
23
+ This model was built with GLM-5 model by applying [AMD-Quark](https://quark.docs.amd.com/latest/index.html) for NVFP4 quantization.
24
+ # Model Quantization
25
+
26
+ The model was quantized from [zai-org/GLM-5](https://huggingface.co/zai-org/GLM-5) using [AMD-Quark](https://quark.docs.amd.com/latest/index.html). The weights and activations are quantized to NVFP4.
27
+
28
+ **Quantization scripts:**
29
+ ```
30
+ sudo sysctl -w vm.max_map_count=4194304
31
+ cd Quark/examples/torch/language_modeling/llm_ptq/
32
+ export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
33
+ export MODEL_DIR=/zai-org/GLM-5
34
+ export output_dir=/amd/GLM-5-NVFP4
35
+ exclude_layers="*self_attn* *mlp.gate *lm_head *mlp.gate_proj *mlp.up_proj *mlp.down_proj"
36
+ python3 quantize_quark.py --model_dir $MODEL_DIR \
37
+ --quant_scheme nvfp4 \
38
+ --num_calib_data 128 \
39
+ --exclude_layers $exclude_layers \
40
+ --model_export hf_format \
41
+ --output_dir $output_dir \
42
+ --multi_gpu balanced
43
+ ```
44
+
45
+ # Deployment
46
+ ### Use with vLLM
47
+
48
+ This model can be deployed efficiently using the [vLLM](https://docs.vllm.ai/en/latest/) backend.
49
+
50
+ ## Evaluation
51
+ The model was evaluated on GSM8K benchmarks.
52
+
53
+ ### Accuracy
54
+
55
+ <table>
56
+ <tr>
57
+ <td><strong>Benchmark</strong>
58
+ </td>
59
+ <td><strong>GLM-5 </strong>
60
+ </td>
61
+ <td><strong>GLM-5-NVFP4(this model) </strong>
62
+ </td>
63
+ <td><strong>Recovery</strong>
64
+ </td>
65
+ </tr>
66
+ <tr>
67
+ <td>GSM8K (flexible-extract)
68
+ </td>
69
+ <td>95.45
70
+ </td>
71
+ <td>95.22
72
+ </td>
73
+ <td>99.75%
74
+ </td>
75
+ </tr>
76
+
77
+ </tr>
78
+ </table>
79
+
80
+ ### Reproduction
81
+
82
+ The GSM8K result was obtained using the `lm-evaluation-harness` framework, based on the Docker image `rocm/vllm-dev:nightly_main_20260603`.
83
+
84
+ Install the lm-eval `(Version: 0.4.12)` in container first.
85
+ ```
86
+ pip install lm-eval
87
+ pip install lm-eval[api]
88
+ ```
89
+ #### Launching Server and Evaluating model
90
+ ```
91
+ export VLLM_ALLOW_LONG_MAX_MODEL_LEN=1
92
+ export VLLM_ROCM_USE_AITER=1
93
+ export VLLM_ROCM_USE_AITER_MLA=1
94
+ export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
95
+ export PYTORCH_ALLOC_CONF=expandable_segments:True
96
+ lm_eval \
97
+ --model vllm \
98
+ --model_args pretrained=/amd/GLM-5-NVFP4,tensor_parallel_size=8,max_model_len=4096,gpu_memory_utilization=0.90,enforce_eager=True,max_gen_toks=2048,kv_cache_dtype=bfloat16,trust_remote_code=True \
99
+ --tasks gsm8k \
100
+ --num_fewshot 5 \
101
+ --batch_size auto
102
+
103
+ ```
104
+ ```
105
+
106
+
107
+ # License
108
+ Modifications Copyright(c) 2026 Advanced Micro Devices, Inc. All rights reserved.
109
+
tokenizer.json CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:47757b9678da19e468edb3ae37a853996599945b5006914e5b088aff30002386
3
- size 20217707
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:19e773648cb4e65de8660ea6365e10acca112d42a854923df93db4a6f333a82d
3
+ size 20217442