JunHowie commited on
Commit
8e7a305
·
verified ·
1 Parent(s): 8677dbb

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -6
README.md CHANGED
@@ -39,10 +39,11 @@ The default `reasoning_effort` is changed to `medium-high` to reduce thinking-to
39
 
40
  Accuracy reference, using the [SGLang GLM-5.2 FP8 H200 / default / low-latency / single-node AIME25 recipe](https://lmsysorg.mintlify.app/cookbook/autoregressive/GLM/GLM-5.2#hw=h200&variant=default&quant=fp8&strategy=low-latency&nodes=single):
41
 
42
- | Model | Quantization | Reasoning effort | AIME25 pass@1 |
43
- |-------|--------------|------------------|---------------|
44
- | ZhipuAI/GLM-5.2-FP8 | FP8 | `max` | `87.7%` |
45
- | tclf90/GLM-5.2-Int4-Int8Mix | Int4-Int8Mix, W4A16/W8A16 | `medium-high` | `86.46%` |
 
46
 
47
  This is a lightweight reproduction reference rather than a full formal benchmark.
48
 
@@ -50,15 +51,16 @@ This is a lightweight reproduction reference rather than a full formal benchmark
50
 
51
  ```
52
  vllm==0.23.0
 
53
  ```
54
 
55
- As of **2026-06-21**, this model has been verified on an 8 x H200 machine with a Python 3.12 virtual environment and vLLM 0.23.0.
56
 
57
  Create a fresh Python environment and install vLLM:
58
  ```
59
  python3.12 -m venv venv
60
  source venv/bin/activate
61
- pip install vllm==0.23.0
62
  ```
63
 
64
  [vLLM Official Guide](https://recipes.vllm.ai/zai-org/GLM-5.2)
 
39
 
40
  Accuracy reference, using the [SGLang GLM-5.2 FP8 H200 / default / low-latency / single-node AIME25 recipe](https://lmsysorg.mintlify.app/cookbook/autoregressive/GLM/GLM-5.2#hw=h200&variant=default&quant=fp8&strategy=low-latency&nodes=single):
41
 
42
+ | Model | Runtime | Quantization | Reasoning effort | AIME25 pass@1 |
43
+ |-------|---------|--------------|------------------|---------------|
44
+ | ZhipuAI/GLM-5.2-FP8 | SGLang | FP8 | `max` | `87.7%` |
45
+ | tclf90/GLM-5.2-Int4-Int8Mix | vLLM | Int4-Int8Mix, W4A16/W8A16 | `max` | `92.92%` |
46
+ | tclf90/GLM-5.2-Int4-Int8Mix | vLLM | Int4-Int8Mix, W4A16/W8A16 | `medium-high` | `86.46%` |
47
 
48
  This is a lightweight reproduction reference rather than a full formal benchmark.
49
 
 
51
 
52
  ```
53
  vllm==0.23.0
54
+ transformers==5.12.1
55
  ```
56
 
57
+ As of **2026-06-21**, this model has been verified on an 8 x H200 machine with a Python 3.12 virtual environment, vLLM 0.23.0, and Transformers 5.12.1.
58
 
59
  Create a fresh Python environment and install vLLM:
60
  ```
61
  python3.12 -m venv venv
62
  source venv/bin/activate
63
+ pip install vllm==0.23.0 transformers==5.12.1
64
  ```
65
 
66
  [vLLM Official Guide](https://recipes.vllm.ai/zai-org/GLM-5.2)