Instructions to use nvidia/Qwen3-8B-DMS-8x with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use nvidia/Qwen3-8B-DMS-8x with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("nvidia/Qwen3-8B-DMS-8x", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
FastDMS: Full DMS implementation running faster than vLLM BF16/FP8
👍 1
#2 opened 17 days ago
by
leonardlin
Triton kernel optimizations for DMS prefill path (up to 1.65x speedup)
🔥 1
#1 opened 2 months ago
by
amiga1200