AttributeError when using tf v5

#29
by P1atinum - opened

I am using vLLM 0.16.0+cu130.
When I run uv pip install --upgrade transformers, it installs Transformers 5.2.0, but this causes an error:
AttributeError: cachedmistralcommonbackend has no attribute is_fast.
However, it works fine with Transformers 4.57.6, although there are some warnings.

It seems that the maximum version of transformers compatible with vLLM is 4.9.

vllm 0.16.0 officially released,so which version of transformers should i use

Mistral AI_ org

Hey thanks for reaching out, can you give the code snippet you use to spin up the model as well as the error trace ?

Hey thanks for reaching out, can you give the code snippet you use to spin up the model as well as the error trace ?

I’m using the latest vLLM (0.16.0+cu130) and Transformers (5.2.0), and I’m launching the model with the same command provided in the model card.
detail: https://gist.github.com/ariable/a58d87605a8121b0151054e85493f089
Thank you very much.

P1atinum changed discussion status to closed
Mistral AI_ org

@P1atinum did you close because you made it work ?

@P1atinum did you close because you made it work ?

This issue hasn’t been resolved yet. It’s possible that the discussion was accidentally closed. I’m considering switching to a different version for further testing.

Mistral AI_ org

I just tested using the latest vllm docker image and installing latest transformers:

vllm                              0.16.0
transformers                      5.2.0
VLLM_DISABLE_COMPILE_CACHE=1 vllm serve mistralai/Voxtral-Mini-4B-Realtime-2602 --compilation_config '{"cudagraph_mode": "PIECEWISE"}'

I don't have any issues to serve and using the referenced examples in the model card. Are you sure to follow exactly the provided snippets ?

I just tested using the latest vllm docker image and installing latest transformers:

vllm                              0.16.0
transformers                      5.2.0
VLLM_DISABLE_COMPILE_CACHE=1 vllm serve mistralai/Voxtral-Mini-4B-Realtime-2602 --compilation_config '{"cudagraph_mode": "PIECEWISE"}'

I don't have any issues to serve and using the referenced examples in the model card. Are you sure to follow exactly the provided snippets ?

It now works with vLLM 0.16.1rc and Transformers 5.3.0dev on CUDA 12.8. I will test it with CUDA 13 when I have time.

Sign up or log in to comment