AI & ML interests

AI for Data Center Build and Operations

Hellohal2064ย 
posted an update 30 days ago
Hellohal2064ย 
posted an update about 1 month ago
view post
Post
260
๐Ÿš€ vLLM Docker Image for NVIDIA DGX Spark (GB10/SM121)

Just released a pre-built vLLM Docker image optimized for DGX Spark's ARM64 + Blackwell SM121 GPU.

**Why this exists:**
Standard vLLM images don't support SM121 - you get "SM121 not supported" errors. This image includes patches for full GB10 compatibility.

**What's included:**
- vLLM 0.15.0 + SM121 patches
- PyTorch 2.11 + CUDA 13.0
- ARM64 (aarch64) native
- Pre-configured for FlashInfer attention

**Verified models:**
- Qwen3-Next-80B-A3B-FP8 (1M context!)
- Qwen3-Embedding-8B (4096-dim embeddings)
- Qwen3-VL-30B (vision)

docker pull
https://hub.docker.com/r/hellohal2064/vllm-dgx-spark-gb10
Hellohal2064ย 
published a Space 2 months ago
Hellohal2064ย 
posted an update 2 months ago
view post
Post
1622
๐Ÿš€ Excited to share: The vLLM container for NVIDIA DGX Spark!

I've been working on getting vLLM to run natively on the new DGX Spark with its GB10 Blackwell GPU (SM121 architecture). The results? 2.5x faster inference compared to llama.cpp!

๐Ÿ“Š Performance Highlights:
โ€ข Qwen3-Coder-30B: 44 tok/s (vs 21 tok/s with llama.cpp)
โ€ข Qwen3-Next-80B: 45 tok/s (vs 18 tok/s with llama.cpp)

๐Ÿ”ง Technical Challenges Solved:
โ€ข Built PyTorch nightly with CUDA 13.1 + SM121 support
โ€ข Patched vLLM for Blackwell architecture
โ€ข Created custom MoE expert configs for GB10
โ€ข Implemented TRITON_ATTN backend workaround

๐Ÿ“ฆ Available now:
โ€ข Docker Hub: docker pull hellohal2064/vllm-dgx-spark-gb10:latest
โ€ข HuggingFace: huggingface.co/Hellohal2064/vllm-dgx-spark-gb10

The DGX Spark's 119GB unified memory opens up possibilities for running massive models locally. Happy to connect with others working on the DGX Spark Blackwell!
ยท