sglang 0.5.9 deploys failed!
#9
by
DeepBird - opened
File "/sgl-workspace/sglang/python/sglang/srt/models/step3p5.py", line 148, in init
self.experts = get_moe_impl_class(quant_config)(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/layers/moe/fused_moe_triton/layer.py", line 292, in init
self.quant_method.create_weights(
File "/sgl-workspace/sglang/python/sglang/srt/layers/quantization/fp8.py", line 727, in create_weights
raise ValueError(
ValueError: The output_size of gate's and up's weight = 320 is not divisible by weight quantization block_n = 128.
It seems that expert parallel is not enabled. Could you provide the deployment commands or refer to the documentation for deployment instructions?
sglang serve --model-path <MODEL_PATH_OR_HF_ID> \
--served-model-name step3p5-flash \
--tp-size 8 \
--ep-size 8 \
--tool-call-parser step3p5 \
--reasoning-parser step3p5 \
--speculative-algorithm EAGLE \
--speculative-num-steps 3 \
--speculative-eagle-topk 1 \
--speculative-num-draft-tokens 4 \
--enable-multi-layer-eagle \
--host 0.0.0.0 \
--port 8000