KV cache scales

#1
by haydn-jones - opened

Getting this from vllm:

(Worker_TP1 pid=1021763) WARNING 12-03 21:31:53 [weight_utils.py:1146] Found v_scale in the checkpoint (e.g. model.layers.9.self_attn.v_proj.v_scale), but not found the expected name in the model (e.g. model.layers.9.self_attn.attn.v_scale). v_scale is not loaded.

Is this expected? The quant config suggests fp8 kv cache was intended, but the scales don't seem to load.

Sign up or log in to comment