nvidia
/

Qwen3-8B-DMS-8x

text-generation-inference

Model card Files Files and versions

Resources

View closed (0)

FastDMS: Full DMS implementation running faster than vLLM BF16/FP8

#2 opened 18 days ago by

Triton kernel optimizations for DMS prefill path (up to 1.65x speedup)

#1 opened 2 months ago by