Equinix

company

Verified

https://www.equinix.com/

Equinix

equinix

Activity Feed Request to join this org

AI & ML interests

AI for Data Center Build and Operations

Hellohal2064

posted an update 30 days ago

Post

245

I have update the vllm to the latest 0.16rc1 at https://hub.docker.com/repository/docker/hellohal2064/vllm-dgx-spark-gb10/general it will run all of the qwen3 models very well with thinking at 41 tok/s it is only setup to run on one spark. I think the documentation on DockerHub is up to date.

Hellohal2064

posted an update about 1 month ago

Post

260

🚀 vLLM Docker Image for NVIDIA DGX Spark (GB10/SM121)

Just released a pre-built vLLM Docker image optimized for DGX Spark's ARM64 + Blackwell SM121 GPU.

**Why this exists:**
Standard vLLM images don't support SM121 - you get "SM121 not supported" errors. This image includes patches for full GB10 compatibility.

**What's included:**
- vLLM 0.15.0 + SM121 patches
- PyTorch 2.11 + CUDA 13.0
- ARM64 (aarch64) native
- Pre-configured for FlashInfer attention

**Verified models:**
- Qwen3-Next-80B-A3B-FP8 (1M context!)
- Qwen3-Embedding-8B (4096-dim embeddings)
- Qwen3-VL-30B (vision)

docker pull
https://hub.docker.com/r/hellohal2064/vllm-dgx-spark-gb10

Hellohal2064

published a Space 2 months ago

README

🚀

Hellohal2064

posted an update 2 months ago

Post

1622

🚀 Excited to share: The vLLM container for NVIDIA DGX Spark!

I've been working on getting vLLM to run natively on the new DGX Spark with its GB10 Blackwell GPU (SM121 architecture). The results? 2.5x faster inference compared to llama.cpp!

📊 Performance Highlights:
• Qwen3-Coder-30B: 44 tok/s (vs 21 tok/s with llama.cpp)
• Qwen3-Next-80B: 45 tok/s (vs 18 tok/s with llama.cpp)

🔧 Technical Challenges Solved:
• Built PyTorch nightly with CUDA 13.1 + SM121 support
• Patched vLLM for Blackwell architecture
• Created custom MoE expert configs for GB10
• Implemented TRITON_ATTN backend workaround

📦 Available now:
• Docker Hub: docker pull hellohal2064/vllm-dgx-spark-gb10:latest
• HuggingFace: huggingface.co/Hellohal2064/vllm-dgx-spark-gb10

The DGX Spark's 119GB unified memory opens up possibilities for running massive models locally. Happy to connect with others working on the DGX Spark Blackwell!

4 replies

AI & ML interests

Team members 8

Equinix-labs's activity

README