⛔️ DO NOT DOWNLOAD — UNTESTED CONVERSION ⛔️

This model has NOT been tested. Do not download or use it yet.

This is an experimental weight-only NVFP4 conversion of zai-org/GLM-5.2. It has not been validated for correctness — it has not been loaded in an inference engine, and no accuracy, perplexity, or generation checks have been run. The NVFP4 packing convention here has not been confirmed against any loader (vLLM / TensorRT-LLM / transformers).

There is no guarantee this checkpoint loads or produces correct output. It is published only as a work-in-progress artifact. A future revision of this card will remove this warning once the conversion has been verified. Until then, please use the original zai-org/GLM-5.2.


What this is

A weight-only NVFP4 quantization of GLM-5.2 (~753B-parameter MoE), produced by a CPU streaming converter:

  • 2-D linear weights → E2M1 (4-bit float), packed 2 values/byte
  • per-16-element block scales → FP8 (E4M3)
  • per-tensor global scale → FP32
  • Dequant: W ≈ e2m1_value × weight_scale ÷ weight_global_scale

Not quantized (kept in bf16): token embeddings, final/layer norms, the MoE router gate, lm_head, and all biases.

No calibration data was used (weight-only / data-free).

Provenance

  • Base model: zai-org/GLM-5.2 (MIT)
  • Conversion: weight-only NVFP4, data-free, CPU streaming
  • Status: experimental, unverified
Downloads last month
45
Safetensors
Model size
425B params
Tensor type
F32
·
BF16
·
F8_E4M3
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for justinjja/GLM-5.2-NVFP4

Base model

zai-org/GLM-5.2
Quantized
(27)
this model