Running on vLLM 0.15.x + Blackwell CUDA 13

#14
by avblex - opened

I have an NVIDIA RTX PRO 6000 Blackwell.
Blackwell support is available only in CUDA version 13.
I tried many options, rebuilt vLLM from source, and tried specifying different backends, but nothing worked.
But in vLLM 0.15.x, it seems there is no attention backend support for Voxtral.
Has anyone managed to run it on Blackwell?

Mistral AI_ org

Is the FlashAttention backend not supported by vLLM? Any chance you could open an issue directly on the vLLM github repo and tag me (patrickvonplaten)?

@avblex I am using it sucesfully on blackwell GB10 (DGX Spark) using ngc py26.01 container and vllm nightly wheel - make sure you keep the container default pytorch. If you need it I can share my dockerfile build.

Hello , @bugtoo I'm interested in your Dockerfile , could you share it please ?

Sure @boulos , here it is:

ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:26.01-py3
FROM ${BASE_IMAGE}

ENV DEBIAN_FRONTEND=noninteractive \
    PIP_NO_CACHE_DIR=1 \
    PYTHONUNBUFFERED=1 \
    VLLM_TARGET_DEVICE=cuda \
    CUDAARCHS=120;121

WORKDIR /opt

RUN --mount=type=cache,target=/root/.cache/pip \
   python3 -m pip install --upgrade pip setuptools wheel uv && \
   python3 -m pip uninstall -y vllm || true

RUN --mount=type=cache,target=/root/.cache/pip \
   UV_SYSTEM_PYTHON=1 UV_BREAK_SYSTEM_PACKAGES=1 uv pip install --force-reinstall --no-deps \
      --prerelease=allow \
      --extra-index-url https://wheels.vllm.ai/nightly/cu130 \
      vllm

RUN python3 - <<'PY'
from importlib.metadata import requires

blocked = ("torch", "torchvision", "torchaudio", "triton")
reqs = requires("vllm") or []

filtered = []
for r in reqs:
    name = r.split(";", 1)[0].strip().lower()
    if any(name.startswith(b) for b in blocked):
        continue
    filtered.append(r)

with open("/tmp/vllm-extra-reqs.txt", "w", encoding="utf-8") as f:
    for r in filtered:
        f.write(r + "\n")
PY

RUN --mount=type=cache,target=/root/.cache/pip \
   UV_SYSTEM_PYTHON=1 UV_BREAK_SYSTEM_PACKAGES=1 uv pip install -r /tmp/vllm-extra-reqs.txt && \
   UV_SYSTEM_PYTHON=1 UV_BREAK_SYSTEM_PACKAGES=1 uv pip install -U soxr librosa soundfile

WORKDIR /workspace

COPY sidecars/stt_vllm/entrypoint.sh /usr/local/bin/stt-vllm-entrypoint.sh
RUN chmod +x /usr/local/bin/stt-vllm-entrypoint.sh

ENTRYPOINT ["/usr/local/bin/stt-vllm-entrypoint.sh"]

Your entrypoint is just your vllm server with the flags you need.

@bugtoo Thank you very much, I’ll try your version. It looks like this new image is just what I need.

@bugtoo Thank you !

Sign up or log in to comment