⛔️ DO NOT DOWNLOAD — UNTESTED CONVERSION ⛔️

This model has NOT been tested. Do not download or use it yet.

This is an experimental weight-only NVFP4 conversion of zai-org/GLM-5.2. It has not been validated for correctness — it has not been loaded in an inference engine, and no accuracy, perplexity, or generation checks have been run. The NVFP4 packing convention here has not been confirmed against any loader (vLLM / TensorRT-LLM / transformers).

There is no guarantee this checkpoint loads or produces correct output. It is published only as a work-in-progress artifact. A future revision of this card will remove this warning once the conversion has been verified. Until then, please use the original zai-org/GLM-5.2.

What this is

A weight-only NVFP4 quantization of GLM-5.2 (~753B-parameter MoE), produced by a CPU streaming converter:

2-D linear weights → E2M1 (4-bit float), packed 2 values/byte
per-16-element block scales → FP8 (E4M3)
per-tensor global scale → FP32
Dequant: W ≈ e2m1_value × weight_scale ÷ weight_global_scale

Not quantized (kept in bf16): token embeddings, final/layer norms, the MoE router gate, lm_head, and all biases.

No calibration data was used (weight-only / data-free).

Provenance

Base model: zai-org/GLM-5.2 (MIT)
Conversion: weight-only NVFP4, data-free, CPU streaming
Status: experimental, unverified

Downloads last month: 45

Safetensors

Model size

425B params

Tensor type

F32

BF16

F8_E4M3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for justinjja/GLM-5.2-NVFP4

Base model

zai-org/GLM-5.2

Quantized

(27)

this model