GLM-4.7-NVFP4

Format: NVFP4 — optimal partial quantization of weights & activations to NVFP4.
Base model: zai-org/GLM-4.7
How it was made: AutoQuantized with NVIDIA Model-Optimizer (NVFP4) with x8 RTX PRO 6000s, using the default calibration mix. (cnn_dailymail and nemotron-post-training-dataset-v2)

Check the original model card for information about this model.

MMLU Benchmark Results: Salyut1/GLM-4.7-NVFP4

Summary Table

Groups	Version	Metric	Value	Stderr
MMLU (Total)	2	acc ↑	0.8348	± 0.0030
Social Sciences	2	acc ↑	0.9051	± 0.0052
Other	2	acc ↑	0.8684	± 0.0058
STEM	2	acc ↑	0.8351	± 0.0064
Humanities	2	acc ↑	0.7664	± 0.0059

STEM

Tasks	Metric	Value	Stderr
High School Biology	acc ↑	0.9516	± 0.0122
College Biology	acc ↑	0.9514	± 0.0180
Astronomy	acc ↑	0.9474	± 0.0182
High School Computer Science	acc ↑	0.9300	± 0.0256
Conceptual Physics	acc ↑	0.9064	± 0.0190
Elementary Mathematics	acc ↑	0.8862	± 0.0164
Electrical Engineering	acc ↑	0.8690	± 0.0281
High School Statistics	acc ↑	0.8565	± 0.0239
College Computer Science	acc ↑	0.8400	± 0.0368
Anatomy	acc ↑	0.8296	± 0.0325
High School Physics	acc ↑	0.7947	± 0.0330
High School Chemistry	acc ↑	0.7882	± 0.0287
Machine Learning	acc ↑	0.7679	± 0.0401
College Physics	acc ↑	0.7647	± 0.0422
Abstract Algebra	acc ↑	0.6800	± 0.0469
College Chemistry	acc ↑	0.6800	± 0.0469
College Mathematics	acc ↑	0.6800	± 0.0469
High School Mathematics	acc ↑	0.6481	± 0.0291

Social Sciences

Tasks	Metric	Value	Stderr
High School Government/Politics	acc ↑	0.9793	± 0.0103
High School Microeconomics	acc ↑	0.9706	± 0.0110
High School Psychology	acc ↑	0.9523	± 0.0091
Human Sexuality	acc ↑	0.9313	± 0.0222
Sociology	acc ↑	0.9204	± 0.0191
High School Geography	acc ↑	0.9192	± 0.0194
High School Macroeconomics	acc ↑	0.9000	± 0.0152
US Foreign Policy	acc ↑	0.9000	± 0.0302
Professional Psychology	acc ↑	0.8725	± 0.0135
Security Studies	acc ↑	0.8653	± 0.0219
Public Relations	acc ↑	0.7636	± 0.0407
Econometrics	acc ↑	0.7544	± 0.0405

Humanities

Tasks	Metric	Value	Stderr
High School US History	acc ↑	0.9461	± 0.0159
High School World History	acc ↑	0.9367	± 0.0158
World Religions	acc ↑	0.9064	± 0.0223
Prehistory	acc ↑	0.8981	± 0.0168
International Law	acc ↑	0.8926	± 0.0283
Jurisprudence	acc ↑	0.8889	± 0.0304
Logical Fallacies	acc ↑	0.8834	± 0.0252
High School European History	acc ↑	0.8788	± 0.0255
Moral Disputes	acc ↑	0.8699	± 0.0181
Philosophy	acc ↑	0.8617	± 0.0196
Formal Logic	acc ↑	0.7460	± 0.0389
Professional Law	acc ↑	0.6610	± 0.0121
Moral Scenarios	acc ↑	0.6425	± 0.0160

Other

Tasks	Metric	Value	Stderr
Medical Genetics	acc ↑	0.9800	± 0.0141
Marketing	acc ↑	0.9530	± 0.0139
Miscellaneous	acc ↑	0.9374	± 0.0087
Professional Medicine	acc ↑	0.9301	± 0.0155
Clinical Knowledge	acc ↑	0.9057	± 0.0180
Nutrition	acc ↑	0.9052	± 0.0168
Management	acc ↑	0.8932	± 0.0306
Business Ethics	acc ↑	0.8600	± 0.0349
Computer Security	acc ↑	0.8600	± 0.0349
Human Aging	acc ↑	0.8161	± 0.0260
College Medicine	acc ↑	0.7977	± 0.0306
Professional Accounting	acc ↑	0.7624	± 0.0254
Global Facts	acc ↑	0.6500	± 0.0479
Virology	acc ↑	0.5723	± 0.0385

vLLM Inference Note:

I needed to patch vllm/model_executor/models/glm4_moe.py to skip specific k_scale and v_scale parameters if they are missing from the checkpoint, rather than crashing. The below script fixed my k_scale and v_scale errors.

import sys
import os
import re

# Path to the vLLM model file
path = '/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/glm4_moe.py'

if os.path.exists(path):
    with open(path, 'r') as f:
        lines = f.readlines()
    
    target_str = 'param = params_dict[name]'
    new_lines = []
    patched = False
    
    for line in lines:
        # We look for the parameter loading line
        if target_str in line and 'k_scale' not in line:
            whitespace = re.match(r'^(\s*)', line).group(1)
            
            # Inject logic: If asking for k_scale/v_scale and it's missing, skip
            payload = f"{whitespace}if ('k_scale' in name or 'v_scale' in name) and name not in params_dict: continue\n"
            
            new_lines.append(payload)
            new_lines.append(line)
            patched = True
        else:
            new_lines.append(line)
            
    if patched:
        with open(path, 'w') as f:
            f.writelines(new_lines)
        print(f"Successfully patched {path}")
    else:
        print("File already patched or target not found.")

Downloads last month: 166

Safetensors

Model size

177B params

Tensor type

BF16

F32

F8_E4M3

Model tree for Salyut1/GLM-4.7-NVFP4

Base model

zai-org/GLM-4.7

Quantized

(44)

this model

Salyut1
/

GLM-4.7-NVFP4