我使用sglang运行该模型的时候，会报“quant_method is not None”

by wangliuwei - opened Aug 9, 2025

Aug 9, 2025

sglang版本： v0.4.10.post2

运行命令:
python3 -m sglang.launch_server
--model-path /deepseek-v3/DeepSeek-R1-0528-AWQ-W4AFP8
--speculative-algorithm NEXTN
--speculative-num-steps 2
--speculative-eagle-topk 1
--speculative-num-draft-tokens 8
--served-model-name deepseek-r1-0528
--trust-remote-code
--tp 8
--host 0.0.0.0 --port 30000
--max-prefill-tokens 16384
--max-running-requests 48
--disable-radix-cache
--mem-fraction-static 0.85
--chunked-prefill-size 8192
--schedule-conservativeness 0.01
--cuda-graph-max-bs=160
--quantization awq
--dtype float16
--stream-output

错误信息：
[2025-08-09 14:23:53] Received sigquit from a child process. It usually means the child failed.
[2025-08-09 14:23:54 TP6] Scheduler hit an exception: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2534, in run_scheduler_process
scheduler = Scheduler(
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 313, in init
self.tp_worker = TpWorkerClass(
File "/sgl-workspace/sglang/python/sglang/srt/managers/tp_worker.py", line 84, in init
self.model_runner = ModelRunner(
File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 242, in init
self.initialize(min_per_gpu_memory)
File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 285, in initialize
self.load_model()
File "/sgl-workspace/sglang/python/sglang/srt/model_executor/model_runner.py", line 643, in load_model
self.model = get_model(
File "/sgl-workspace/sglang/python/sglang/srt/model_loader/init.py", line 22, in get_model
return loader.load_model(
File "/sgl-workspace/sglang/python/sglang/srt/model_loader/loader.py", line 432, in load_model
model = _initialize_model(
File "/sgl-workspace/sglang/python/sglang/srt/model_loader/loader.py", line 174, in _initialize_model
return model_class(
File "/sgl-workspace/sglang/python/sglang/srt/models/deepseek_v2.py", line 2075, in init
self.model = DeepseekV2Model(
File "/sgl-workspace/sglang/python/sglang/srt/models/deepseek_v2.py", line 1990, in init
[
File "/sgl-workspace/sglang/python/sglang/srt/models/deepseek_v2.py", line 1991, in
DeepseekV2DecoderLayer(
File "/sgl-workspace/sglang/python/sglang/srt/models/deepseek_v2.py", line 1792, in init
self.mlp = DeepseekV2MoE(
File "/sgl-workspace/sglang/python/sglang/srt/models/deepseek_v2.py", line 325, in init
self.experts = get_moe_impl_class()(
File "/sgl-workspace/sglang/python/sglang/srt/layers/moe/fused_moe_triton/layer.py", line 161, in init
assert self.quant_method is not None
AssertionError

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment