VLLM or SGLang?

by dipta007 - opened Mar 2, 2025

Discussion

dipta007

Mar 2, 2025

Does the model support vllm or sglang?

pigggg

Mar 3, 2025

vllm is supported

UserName4Ever

Jun 4, 2025

vLLM working using docker:

services:
  vllm-openai:
    image: vllm/vllm-openai:v0.8.5.post1
    runtime: nvidia
    ports:
      - "8000:8000"
    volumes:
      - /opt/vllm/models/:/models/
    environment:
      - HF_HUB_OFFLINE=1
    ipc: host
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]
    command: --model ModelSpace/GemmaX2-28-9B-v0.1 --task generate --served-model-name "GemmaX2" --gpu-memory-utilization 0.9 --cpu-offload-gb 56

Test the api using /docs (swagger) and /v1/chat/completions:

 {"model":"GemmaX2","messages":[{"role":"user","content":"Translate this from Arabic to English: Arabic: أنا أحب الترجمة الآلية English:"}],"max_tokens":512}

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment