When running GLM-4.7 with sglang-0.5.6 to process a /v1/chat/completions request, it encounters a BadRequestError

#30
by tuo02 - opened

I am running GLM-4.7 on 32 RTX 5090 cards, and this is my launch command :

python3 -m sglang.launch_server \
  --model-path /mnt/data/models/GLM-4.7 \
  --tp-size 32 \
  --trust-remote-code \
  --dist-init-addr $MASTER_ADDR:$MASTER_PORT \
  --nnodes $WORLD_SIZE \
  --node-rank $RANK \
  --tool-call-parser glm  \
  --reasoning-parser glm45 \
  --speculative-algorithm EAGLE \
  --speculative-num-steps 3 \
  --speculative-eagle-topk 1 \
  --speculative-num-draft-tokens 4 \
  --mem-fraction-static 0.7 \
  --context-length 202752 \
  --served-model-name zai-org/GLM-4.7 \
  --model-loader-extra-config='{"enable_multithread_load": "true","num_threads": 64}' \
  --enable-metrics

and the request is :

curl localhost:30000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "zai-org/GLM-4.7",
    "messages": [
      {
        "role": "user",
        "content": "Hello"
      }
    ],
    "temperature": 0.7,
    "max_tokens": 100
  }'

while the response is:

{"object":"error","message":"input_ids should be a list of lists for batch processing.","type":"BadRequestError","param":null,"code":400}

Is there something wrong ?

try using transformers with 5.0.0rc1? or 4.57.1, one of this work

try using transformers with 5.0.0rc1? or 4.57.1, one of this work

4.57.1 is ok.

Sign up or log in to comment