Model Description

A quantization setup used for GLM-4.7:

  • Weights: NVFP4
  • KV cache: FP8
  • Tooling: NVIDIA/Model-Optimizer
  • Deploy with TensorRT-LLM
Downloads last month
7,034
Safetensors
Model size
177B params
Tensor type
BF16
F32
F8_E4M3
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for soundsgoodai/GLM-4.7-NVFP4-KV-cache-FP8

Base model

zai-org/GLM-4.7
Quantized
(41)
this model