vllm online serving

by aperez900907 - opened Jun 13, 2025

Jun 13, 2025

When running the model with VLLM, it logs this warning:

WARNING 06-13 09:05:21 [api_server.py:848] To indicate that the rerank API is not part of the standard OpenAI API, we have located it at /rerank. Please update your client accordingly. (Note: Conforms to JinaAI rerank API)

I query /rerank /v1/rerank /v2/rerank and always get this output error:

{
    "object": "error",
    "message": "The model does not support Rerank (Score) API",
    "type": "BadRequestError",
    "param": null,
    "code": 400
}

The requests are 200 to all endpoints, but they do not work.

scotto2

Jun 13, 2025

how are u running this

XuHNR

Jun 16, 2025

https://github.com/vllm-project/vllm/pull/19260

I succeed by following this

wirthual

Jun 19, 2025

Easiest way is to use an adapted model, I tested it with infinity and this one worked:

infinity_emb v2 --model-id tomaarsen/Qwen3-Reranker-0.6B-seq-cls

pricol

Nov 26, 2025

try to run with --task score

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment