Turboquant and mtp?

#8
by tasticleeze - opened

Anyone running on vllm with tq and mtp?

Hi @tasticleeze , for information on running with turboquant, see https://docs.vllm.ai/en/latest/api/vllm/model_executor/layers/quantization/turboquant/

As for MTP, we don't provide a way to create it if the model does not already ship with mtp layers. Alternatively, we do provide speculator models for speculative decoding. You can view them in this search here. Hope this helps!

The official model with the mtp feature is here : https://huggingface.co/google/gemma-4-31B-it-assistant

@bdellabe it can be very useful if we can run a nvfp4 version of this one

Sign up or log in to comment