Failed with v0.9.2 on 8x2080Ti 22GB
#2
by xydarcher - opened
it report thatWARNING 07-30 10:43:12 [config.py:440] MoE DP setup unable to determine quantization scheme or unsupported quantization type. This model will not run with DP enabled.
and
File "/root/.local/share/pipx/venvs/vllm-v0-9-2/lib/python3.10/site-packages/vllm/model_executor/models/qwen3_moe.py", line 113, in __init__
ERROR 07-30 10:43:13 [multiproc_worker_utils.py:239] self.experts = FusedMoE(num_experts=config.num_experts,
ERROR 07-30 10:43:13 [multiproc_worker_utils.py:239] File "/root/.local/share/pipx/venvs/vllm-v0-9-2/lib/python3.10/site-packages/vllm/model_executor/layers/fused_moe/layer.py", line 735, in __init__
ERROR 07-30 10:43:13 [multiproc_worker_utils.py:239] assert quant_method is not None
ERROR 07-30 10:43:13 [multiproc_worker_utils.py:239] AssertionError
Does anyone have run this correctly.
The GPTQ‑Marlin MoE module is only compatible with Compute Capability ≥ 8.0 (Ampere and newer). Compute 7.x devices (like RTX 2080 Ti) do not natively support the GPTQ‑Marlin mode.
The GPTQ‑Marlin MoE module is only compatible with Compute Capability ≥ 8.0 (Ampere and newer). Compute 7.x devices (like RTX 2080 Ti) do not natively support the GPTQ‑Marlin mode.
Is that possible to port this feature to gptq.py file? May be I could try with the help of LLM model.