Update vLLM usage docs: remove config_vllm.json overwrite, relax version pin, and clarify minimal required flags

#18

by nvidia-oliver-holworthy - opened Mar 3

base: refs/heads/main

←

from: refs/pr/18

Discussion Files changed

+37

-56

nvidia-oliver-holworthy

NVIDIA org Mar 3

•

edited Mar 3

This PR updates the vLLM Usage section in README.md to reflect current behavior for nvidia/llama-nemotron-embed-1b-v2.

What changed

Updated version guidance from vllm==0.16.0 to vllm>=0.14.0.
Removed the outdated step to overwrite config.json with config_vllm.json.
Simplified the serving command to the minimal required invocation using the HF repo ID:
vllm serve nvidia/llama-nemotron-embed-1b-v2 --trust-remote-code
Clarified that a local path can also be used instead of the HF repo ID.
Removed --runner pooling and --pooler-config from recommended flags.
Kept only operational optional flags (--dtype, --data-parallel-size, --port).
Added clarification that mean pooling is already configured in config.json ("pooling": "avg"), so overriding pooler config is generally unnecessary and not recommended for retrieval quality.
Updated the OpenAI SDK example to include api_key="EMPTY" (required by the OpenAI Python client even for local vLLM).
Added an offline vLLM Python API example (LLM(...).embed(...)) to clearly distinguish online serving vs offline inference usage.

Why

The model now works with modern vLLM using its default config.json, so the previous config replacement workflow is no longer needed. The updated docs reduce setup friction and align examples with current vLLM usage patterns.

Validation

Confirmed startup works with minimal command and no config replacement on vLLM 0.14.0.
Confirmed config.json already encodes the expected pooling default.
Confirmed OpenAI client requires an API key field and works with api_key="EMPTY" for local vLLM.
Confirmed output embeddings match the reference PyTorch/Transformers implementation.

Update vLLM usage docsa0bb0cc8

Remove unused config_vllm.jsona19eb9dd

Add api_key to OpenAI class to avoid error with vLLM example528d04eb

Add offline example with vLLMbcd022f5

nvidia-oliver-holworthy changed pull request status to open Mar 3

Remove note on removed config_vllm.json09a78e8e

ybabakhin changed pull request status to merged Mar 3

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment