SGLang deploy commands

by vvekthkr - opened Feb 12

Feb 12

Could you please share recommended SGLang deploy commands, I currently use a rtx 5090 and a pro 6000. If all goes well, I might jump from a 4B model to 8B model with data-parallel pipeline of 2.

bflhc

Octen-Team org Feb 12

I’m not sure whether sglang supports deployment for this yet, but we’ve used vLLM and it does work.

You can refer to this example for details: https://huggingface.co/Qwen/Qwen3-Embedding-8B#vllm-usage

vvekthkr

Feb 21

What ran well for me.

python -m sglang.launch_server
--model-path Octen/Octen-Embedding-0.6B
--host 0.0.0.0
--port 5000
--is-embedding
--enable-trace
--enable-metrics
--otlp-traces-endpoint 0.0.0.0:4317
--mem-fraction-static 0.20
--log-requests
--show-time-cost
--data-parallel-size 2
--load-balance-method auto
--max-running-requests 64

vvekthkr changed discussion status to closed Feb 21

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment