soundsgoodai
/

GLM-4.7-NVFP4-KV-cache-FP8

Text Generation

8-bit precision

Model card Files Files and versions

Model Description

A quantization setup used for GLM-4.7:

Weights: NVFP4
KV cache: FP8
Tooling: NVIDIA/Model-Optimizer
Deploy with TensorRT-LLM

Downloads last month: 5

Safetensors

Model size

177B params

Tensor type

BF16

·

F32

·

F8_E4M3

·

U8

·

Model tree for soundsgoodai/GLM-4.7-NVFP4-KV-cache-FP8

Base model

zai-org/GLM-4.7

Quantized

(43)

this model