Failed with v0.9.2 on 8x2080Ti 22GB

#2
by xydarcher - opened

it report that
WARNING 07-30 10:43:12 [config.py:440] MoE DP setup unable to determine quantization scheme or unsupported quantization type. This model will not run with DP enabled.
and

 File "/root/.local/share/pipx/venvs/vllm-v0-9-2/lib/python3.10/site-packages/vllm/model_executor/models/qwen3_moe.py", line 113, in __init__
ERROR 07-30 10:43:13 [multiproc_worker_utils.py:239]     self.experts = FusedMoE(num_experts=config.num_experts,
ERROR 07-30 10:43:13 [multiproc_worker_utils.py:239]   File "/root/.local/share/pipx/venvs/vllm-v0-9-2/lib/python3.10/site-packages/vllm/model_executor/layers/fused_moe/layer.py", line 735, in __init__
ERROR 07-30 10:43:13 [multiproc_worker_utils.py:239]     assert quant_method is not None
ERROR 07-30 10:43:13 [multiproc_worker_utils.py:239] AssertionError

Does anyone have run this correctly.

QuantTrio org

The GPTQ‑Marlin MoE module is only compatible with Compute Capability ≥ 8.0 (Ampere and newer). Compute 7.x devices (like RTX 2080 Ti) do not natively support the GPTQ‑Marlin mode.

The GPTQ‑Marlin MoE module is only compatible with Compute Capability ≥ 8.0 (Ampere and newer). Compute 7.x devices (like RTX 2080 Ti) do not natively support the GPTQ‑Marlin mode.

Is that possible to port this feature to gptq.py file? May be I could try with the help of LLM model.

Sign up or log in to comment