Running on vLLM 0.15.x + Blackwell CUDA 13

#14

by avblex - opened 25 days ago

I have an NVIDIA RTX PRO 6000 Blackwell.
Blackwell support is available only in CUDA version 13.
I tried many options, rebuilt vLLM from source, and tried specifying different backends, but nothing worked.
But in vLLM 0.15.x, it seems there is no attention backend support for Voxtral.
Has anyone managed to run it on Blackwell?

eschmidbauer

25 days ago

same issue

patrickvonplaten

Mistral AI_ org 23 days ago

Is the FlashAttention backend not supported by vLLM? Any chance you could open an issue directly on the vLLM github repo and tag me (patrickvonplaten)?

bugtoo

19 days ago

@avblex I am using it sucesfully on blackwell GB10 (DGX Spark) using ngc py26.01 container and vllm nightly wheel - make sure you keep the container default pytorch. If you need it I can share my dockerfile build.

boulos

18 days ago

Hello , @bugtoo I'm interested in your Dockerfile , could you share it please ?

bugtoo

18 days ago

Sure @boulos , here it is:

ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:26.01-py3
FROM ${BASE_IMAGE}

ENV DEBIAN_FRONTEND=noninteractive \
    PIP_NO_CACHE_DIR=1 \
    PYTHONUNBUFFERED=1 \
    VLLM_TARGET_DEVICE=cuda \
    CUDAARCHS=120;121

WORKDIR /opt

RUN --mount=type=cache,target=/root/.cache/pip \
   python3 -m pip install --upgrade pip setuptools wheel uv && \
   python3 -m pip uninstall -y vllm || true

RUN --mount=type=cache,target=/root/.cache/pip \
   UV_SYSTEM_PYTHON=1 UV_BREAK_SYSTEM_PACKAGES=1 uv pip install --force-reinstall --no-deps \
      --prerelease=allow \
      --extra-index-url https://wheels.vllm.ai/nightly/cu130 \
      vllm

RUN python3 - <<'PY'
from importlib.metadata import requires

blocked = ("torch", "torchvision", "torchaudio", "triton")
reqs = requires("vllm") or []

filtered = []
for r in reqs:
    name = r.split(";", 1)[0].strip().lower()
    if any(name.startswith(b) for b in blocked):
        continue
    filtered.append(r)

with open("/tmp/vllm-extra-reqs.txt", "w", encoding="utf-8") as f:
    for r in filtered:
        f.write(r + "\n")
PY

RUN --mount=type=cache,target=/root/.cache/pip \
   UV_SYSTEM_PYTHON=1 UV_BREAK_SYSTEM_PACKAGES=1 uv pip install -r /tmp/vllm-extra-reqs.txt && \
   UV_SYSTEM_PYTHON=1 UV_BREAK_SYSTEM_PACKAGES=1 uv pip install -U soxr librosa soundfile

WORKDIR /workspace

COPY sidecars/stt_vllm/entrypoint.sh /usr/local/bin/stt-vllm-entrypoint.sh
RUN chmod +x /usr/local/bin/stt-vllm-entrypoint.sh

ENTRYPOINT ["/usr/local/bin/stt-vllm-entrypoint.sh"]

Your entrypoint is just your vllm server with the flags you need.

avblex

17 days ago

@bugtoo Thank you very much, I’ll try your version. It looks like this new image is just what I need.

boulos

17 days ago

@bugtoo Thank you !

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment