⛔️ DO NOT DOWNLOAD — UNTESTED CONVERSION ⛔️
This model has NOT been tested. Do not download or use it yet.
This is an experimental weight-only NVFP4 conversion of
zai-org/GLM-5.2. It has not been validated for correctness — it has not been loaded in an inference engine, and no accuracy, perplexity, or generation checks have been run. The NVFP4 packing convention here has not been confirmed against any loader (vLLM / TensorRT-LLM / transformers).There is no guarantee this checkpoint loads or produces correct output. It is published only as a work-in-progress artifact. A future revision of this card will remove this warning once the conversion has been verified. Until then, please use the original
zai-org/GLM-5.2.
What this is
A weight-only NVFP4 quantization of GLM-5.2 (~753B-parameter MoE), produced by a CPU streaming converter:
- 2-D linear weights → E2M1 (4-bit float), packed 2 values/byte
- per-16-element block scales → FP8 (E4M3)
- per-tensor global scale → FP32
- Dequant:
W ≈ e2m1_value × weight_scale ÷ weight_global_scale
Not quantized (kept in bf16): token embeddings, final/layer norms, the MoE router
gate, lm_head, and all biases.
No calibration data was used (weight-only / data-free).
Provenance
- Base model:
zai-org/GLM-5.2(MIT) - Conversion: weight-only NVFP4, data-free, CPU streaming
- Status: experimental, unverified
- Downloads last month
- 45
Model tree for justinjja/GLM-5.2-NVFP4
Base model
zai-org/GLM-5.2