AttributeError when using tf v5

#29

by P1atinum - opened 8 days ago

I am using vLLM 0.16.0+cu130.
When I run uv pip install --upgrade transformers, it installs Transformers 5.2.0, but this causes an error:
AttributeError: cachedmistralcommonbackend has no attribute is_fast.
However, it works fine with Transformers 4.57.6, although there are some warnings.

liuyt6515

8 days ago

It seems that the maximum version of transformers compatible with vLLM is 4.9.

P1atinum

6 days ago

vllm 0.16.0 officially released,so which version of transformers should i use

juliendenize

Mistral AI_ org 6 days ago

Hey thanks for reaching out, can you give the code snippet you use to spin up the model as well as the error trace ?

P1atinum

5 days ago

Hey thanks for reaching out, can you give the code snippet you use to spin up the model as well as the error trace ?

I’m using the latest vLLM (0.16.0+cu130) and Transformers (5.2.0), and I’m launching the model with the same command provided in the model card.
detail: https://gist.github.com/ariable/a58d87605a8121b0151054e85493f089
Thank you very much.

P1atinum changed discussion status to closed 5 days ago

juliendenize

Mistral AI_ org 3 days ago

@P1atinum did you close because you made it work ?

P1atinum

3 days ago

@P1atinum did you close because you made it work ?

This issue hasn’t been resolved yet. It’s possible that the discussion was accidentally closed. I’m considering switching to a different version for further testing.

juliendenize

Mistral AI_ org 2 days ago

I just tested using the latest vllm docker image and installing latest transformers:

vllm                              0.16.0
transformers                      5.2.0

VLLM_DISABLE_COMPILE_CACHE=1 vllm serve mistralai/Voxtral-Mini-4B-Realtime-2602 --compilation_config '{"cudagraph_mode": "PIECEWISE"}'

I don't have any issues to serve and using the referenced examples in the model card. Are you sure to follow exactly the provided snippets ?

P1atinum

about 12 hours ago

I just tested using the latest vllm docker image and installing latest transformers:
vllm                              0.16.0
transformers                      5.2.0
VLLM_DISABLE_COMPILE_CACHE=1 vllm serve mistralai/Voxtral-Mini-4B-Realtime-2602 --compilation_config '{"cudagraph_mode": "PIECEWISE"}'
I don't have any issues to serve and using the referenced examples in the model card. Are you sure to follow exactly the provided snippets ?

It now works with vLLM 0.16.1rc and Transformers 5.3.0dev on CUDA 12.8. I will test it with CUDA 13 when I have time.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment