When running GLM-4.7 with sglang-0.5.6 to process a /v1/chat/completions request, it encounters a BadRequestError
#30
by
tuo02
- opened
I am running GLM-4.7 on 32 RTX 5090 cards, and this is my launch command :
python3 -m sglang.launch_server \
--model-path /mnt/data/models/GLM-4.7 \
--tp-size 32 \
--trust-remote-code \
--dist-init-addr $MASTER_ADDR:$MASTER_PORT \
--nnodes $WORLD_SIZE \
--node-rank $RANK \
--tool-call-parser glm \
--reasoning-parser glm45 \
--speculative-algorithm EAGLE \
--speculative-num-steps 3 \
--speculative-eagle-topk 1 \
--speculative-num-draft-tokens 4 \
--mem-fraction-static 0.7 \
--context-length 202752 \
--served-model-name zai-org/GLM-4.7 \
--model-loader-extra-config='{"enable_multithread_load": "true","num_threads": 64}' \
--enable-metrics
and the request is :
curl localhost:30000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "zai-org/GLM-4.7",
"messages": [
{
"role": "user",
"content": "Hello"
}
],
"temperature": 0.7,
"max_tokens": 100
}'
while the response is:
{"object":"error","message":"input_ids should be a list of lists for batch processing.","type":"BadRequestError","param":null,"code":400}
Is there something wrong ?
try using transformers with 5.0.0rc1? or 4.57.1, one of this work
try using transformers with 5.0.0rc1? or 4.57.1, one of this work
4.57.1 is ok.