KV cache scales
#1
by
haydn-jones
- opened
Getting this from vllm:
(Worker_TP1 pid=1021763) WARNING 12-03 21:31:53 [weight_utils.py:1146] Found v_scale in the checkpoint (e.g. model.layers.9.self_attn.v_proj.v_scale), but not found the expected name in the model (e.g. model.layers.9.self_attn.attn.v_scale). v_scale is not loaded.
Is this expected? The quant config suggests fp8 kv cache was intended, but the scales don't seem to load.