KV cache scales

by haydn-jones - opened Dec 4, 2025

Dec 4, 2025

Getting this from vllm:

(Worker_TP1 pid=1021763) WARNING 12-03 21:31:53 [weight_utils.py:1146] Found v_scale in the checkpoint (e.g. model.layers.9.self_attn.v_proj.v_scale), but not found the expected name in the model (e.g. model.layers.9.self_attn.attn.v_scale). v_scale is not loaded.

Is this expected? The quant config suggests fp8 kv cache was intended, but the scales don't seem to load.

Xinxinli

Mar 11

the kv-cache scales will default to 1 if not loaded, which would be the same value in the checkpoint as well. However, fixing this in the FWs (vLLM) would be useful

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment