Voxtral Transformers path: chat template fails ("Can't compile non template nodes") and fallback output is garbled

#35

by PierrunoYT - opened Mar 28

Discussion

PierrunoYT

Mar 28

•

edited Mar 28

Issue Template (Filled): Voxtral + Transformers

Title

Voxtral Transformers path: chat template fails ("Can't compile non template nodes") and fallback output is garbled

Summary

Running Voxtral locally with the Transformers backend on Windows fails in multiple ways:

apply_chat_template fails with Can't compile non template nodes
fallback path can fail with name 'TranscriptionRequest' is not defined
direct processor fallback can fail due to missing pad token
when fallback does run, output can be garbled with repeated [TOOL_CALLS] and unreadable token soup

Environment

OS: Windows 11 (version 10.0.26200).
Shell: PowerShell / CMD via Pinokio
Python: 3.10.15 (Conda base + app venv)
App: PierrunoYT/Voxtral-UI-Pinokio
Backend: Transformers (no vLLM server)
Launch path: D:\pinokio\api\Voxtral-UI-Pinokio.git

Model + Runtime

Default model: mistralai/Voxtral-Mini-3B-2507
Alternative model available: mistralai/Voxtral-Small-24B-2507
UI loads/downloads model on Generate click (not app startup)

Relevant Package/Version Notes Observed

transformers >= 4.56.0 used
accelerate >= 0.34.0 used
Torch stack had mismatch incidents (torchvision::nms does not exist)
At one point uv upgraded torch to 2.11.0 automatically
librosa was required for fallback audio loading (load_audio_as requires the librosa library)

Reproduction Steps

Start app (python app.py) from Pinokio env.
Open Gradio UI.
Select model (mistralai/Voxtral-Mini-3B-2507).
Upload audio and enter prompt.
Click Generate.

Expected Behavior

Readable transcription/answer for the uploaded audio and prompt.

Actual Behavior

Frequently fails with one of the following:

Error 1

Can't compile non template nodes

Error 2

name 'TranscriptionRequest' is not defined

Error 3

Asking to pad but the tokenizer does not have a padding token.
Please select a token to use as `pad_token` ...

Error 4 (garbled output)

Output includes repeated [TOOL_CALLS] and unreadable token fragments (example):

[Modell: mistralai/Voxtral-Mini-3B-2507]
[Hinweis: Chat-Template fehlgeschlagen, Transkriptions-Fallback wurde verwendet.]

[TOOL_CALLS] [TOOL_CALLS] [TOOL_CALLS] ... Ã©e ren ache tern ption ...

Additional Related Error Encountered Earlier

Before stabilizing env, importing transformers could fail due to torch/torchvision mismatch:

RuntimeError: operator torchvision::nms does not exist
ModuleNotFoundError: Could not import module 'AutoProcessor'

What Was Already Tried

Removed vLLM and moved to Transformers-only backend.
Lazy model import to avoid startup crash.
Multiple fallback input-building paths.
Added librosa.
Set tokenizer/model pad_token fallback.
Deterministic generation + output cleaning.
Reset/reinstall env and torch stack cleanup/reinstall attempts.

Request

Please advise the correct and stable inference path for Voxtral on Transformers in this setup (Windows), specifically for:

chat template compilation failure (Can't compile non template nodes)
transcription request API consistency (TranscriptionRequest path)
avoiding garbled [TOOL_CALLS] generation in fallback mode.

sanchit-gandhi

Mistral AI_ org Apr 10

Hey @PierrunoYT - you shouldn't need to use the chat completion template for Voxtral Realtime. It's a pure transcription model, so doesn't accept a chat-completion style request. Just an audio input. Hence the use of the TranscriptionRequest.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment