Failed with v0.9.2 on 8x2080Ti 22GB

by xydarcher - opened Jul 30, 2025

Jul 30, 2025

it report that
WARNING 07-30 10:43:12 [config.py:440] MoE DP setup unable to determine quantization scheme or unsupported quantization type. This model will not run with DP enabled.
and

 File "/root/.local/share/pipx/venvs/vllm-v0-9-2/lib/python3.10/site-packages/vllm/model_executor/models/qwen3_moe.py", line 113, in __init__
ERROR 07-30 10:43:13 [multiproc_worker_utils.py:239]     self.experts = FusedMoE(num_experts=config.num_experts,
ERROR 07-30 10:43:13 [multiproc_worker_utils.py:239]   File "/root/.local/share/pipx/venvs/vllm-v0-9-2/lib/python3.10/site-packages/vllm/model_executor/layers/fused_moe/layer.py", line 735, in __init__
ERROR 07-30 10:43:13 [multiproc_worker_utils.py:239]     assert quant_method is not None
ERROR 07-30 10:43:13 [multiproc_worker_utils.py:239] AssertionError

Does anyone have run this correctly.

JunHowie

QuantTrio org Jul 30, 2025

The GPTQ‑Marlin MoE module is only compatible with Compute Capability ≥ 8.0 (Ampere and newer). Compute 7.x devices (like RTX 2080 Ti) do not natively support the GPTQ‑Marlin mode.

xydarcher

Aug 2, 2025

The GPTQ‑Marlin MoE module is only compatible with Compute Capability ≥ 8.0 (Ampere and newer). Compute 7.x devices (like RTX 2080 Ti) do not natively support the GPTQ‑Marlin mode.

Is that possible to port this feature to gptq.py file? May be I could try with the help of LLM model.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment