Error serving model

#2
by EvGUT - opened

Hi, trying to serve this model with VLLM and get this error

File "/home/ubuntu/venv/lib/python3.10/site-packages/vllm/model_executor/layers/quantization/utils/marlin_utils_fp8.py", line 79, in prepare_fp8_layer_for_marlin
[rank0]: is_channelwise = layer.weight_scale.shape[0] == part_size_n
[rank0]: IndexError: tuple index out of range

Any ideas how to solve this?

vLLM is built from source | 9364f74eee2e8aab9e3c9cd6dea290018ef43b95

Red Hat AI org

Hey @EvGUT thanks for reporting this, we think it is a bug that just landed with that commit. Can you please try to build with the commit just before this - 9042d683620a7e3fa75c953fe9cca29086ce2b9a?

thank you, previous commit solved the problem

EvGUT changed discussion status to closed
Red Hat AI org

Thanks again for reporting, it should be resolved with https://github.com/vllm-project/vllm/pull/6609

Sign up or log in to comment