Error serving model
#2
by
EvGUT - opened
Hi, trying to serve this model with VLLM and get this error
File "/home/ubuntu/venv/lib/python3.10/site-packages/vllm/model_executor/layers/quantization/utils/marlin_utils_fp8.py", line 79, in prepare_fp8_layer_for_marlin
[rank0]: is_channelwise = layer.weight_scale.shape[0] == part_size_n
[rank0]: IndexError: tuple index out of range
Any ideas how to solve this?
vLLM is built from source | 9364f74eee2e8aab9e3c9cd6dea290018ef43b95
thank you, previous commit solved the problem
EvGUT changed discussion status to
closed
Thanks again for reporting, it should be resolved with https://github.com/vllm-project/vllm/pull/6609