Running on vLLM 0.15.x + Blackwell CUDA 13
#14
by
avblex - opened
I have an NVIDIA RTX PRO 6000 Blackwell.
Blackwell support is available only in CUDA version 13.
I tried many options, rebuilt vLLM from source, and tried specifying different backends, but nothing worked.
But in vLLM 0.15.x, it seems there is no attention backend support for Voxtral.
Has anyone managed to run it on Blackwell?
same issue
Is the FlashAttention backend not supported by vLLM? Any chance you could open an issue directly on the vLLM github repo and tag me (patrickvonplaten)?
Sure @boulos , here it is:
ARG BASE_IMAGE=nvcr.io/nvidia/pytorch:26.01-py3
FROM ${BASE_IMAGE}
ENV DEBIAN_FRONTEND=noninteractive \
PIP_NO_CACHE_DIR=1 \
PYTHONUNBUFFERED=1 \
VLLM_TARGET_DEVICE=cuda \
CUDAARCHS=120;121
WORKDIR /opt
RUN --mount=type=cache,target=/root/.cache/pip \
python3 -m pip install --upgrade pip setuptools wheel uv && \
python3 -m pip uninstall -y vllm || true
RUN --mount=type=cache,target=/root/.cache/pip \
UV_SYSTEM_PYTHON=1 UV_BREAK_SYSTEM_PACKAGES=1 uv pip install --force-reinstall --no-deps \
--prerelease=allow \
--extra-index-url https://wheels.vllm.ai/nightly/cu130 \
vllm
RUN python3 - <<'PY'
from importlib.metadata import requires
blocked = ("torch", "torchvision", "torchaudio", "triton")
reqs = requires("vllm") or []
filtered = []
for r in reqs:
name = r.split(";", 1)[0].strip().lower()
if any(name.startswith(b) for b in blocked):
continue
filtered.append(r)
with open("/tmp/vllm-extra-reqs.txt", "w", encoding="utf-8") as f:
for r in filtered:
f.write(r + "\n")
PY
RUN --mount=type=cache,target=/root/.cache/pip \
UV_SYSTEM_PYTHON=1 UV_BREAK_SYSTEM_PACKAGES=1 uv pip install -r /tmp/vllm-extra-reqs.txt && \
UV_SYSTEM_PYTHON=1 UV_BREAK_SYSTEM_PACKAGES=1 uv pip install -U soxr librosa soundfile
WORKDIR /workspace
COPY sidecars/stt_vllm/entrypoint.sh /usr/local/bin/stt-vllm-entrypoint.sh
RUN chmod +x /usr/local/bin/stt-vllm-entrypoint.sh
ENTRYPOINT ["/usr/local/bin/stt-vllm-entrypoint.sh"]
Your entrypoint is just your vllm server with the flags you need.