Loren's picture

1

Loren

lsmc

AI & ML interests

None yet

Organizations

None yet

New activity in nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 2 months ago

vLLM MTP unusable on RTX 6000 Pro, as spec decoding consumes 20GB+ VRAM at start-up, causing OOM

#9 opened 2 months ago by

vLLM MTP unusable on RTX 6000 Pro, as spec decoding consumes 20GB+ VRAM at start-up, causing OOM

#9 opened 2 months ago by

vLLM MTP unusable on RTX 6000 Pro, as spec decoding consumes 20GB+ VRAM at start-up, causing OOM

#9 opened 2 months ago by