Voxtral Transformers path: chat template fails ("Can't compile non template nodes") and fallback output is garbled
Issue Template (Filled): Voxtral + Transformers
Title
Voxtral Transformers path: chat template fails ("Can't compile non template nodes") and fallback output is garbled
Summary
Running Voxtral locally with the Transformers backend on Windows fails in multiple ways:
apply_chat_templatefails withCan't compile non template nodes- fallback path can fail with
name 'TranscriptionRequest' is not defined - direct processor fallback can fail due to missing pad token
- when fallback does run, output can be garbled with repeated
[TOOL_CALLS]and unreadable token soup
Environment
- OS: Windows 11 (version 10.0.26200).
- Shell: PowerShell / CMD via Pinokio
- Python: 3.10.15 (Conda base + app venv)
- App:
PierrunoYT/Voxtral-UI-Pinokio - Backend: Transformers (no vLLM server)
- Launch path:
D:\pinokio\api\Voxtral-UI-Pinokio.git
Model + Runtime
- Default model:
mistralai/Voxtral-Mini-3B-2507 - Alternative model available:
mistralai/Voxtral-Small-24B-2507 - UI loads/downloads model on Generate click (not app startup)
Relevant Package/Version Notes Observed
transformers >= 4.56.0usedaccelerate >= 0.34.0used- Torch stack had mismatch incidents (
torchvision::nms does not exist) - At one point
uvupgraded torch to2.11.0automatically librosawas required for fallback audio loading (load_audio_as requires the librosa library)
Reproduction Steps
- Start app (
python app.py) from Pinokio env. - Open Gradio UI.
- Select model (
mistralai/Voxtral-Mini-3B-2507). - Upload audio and enter prompt.
- Click Generate.
Expected Behavior
Readable transcription/answer for the uploaded audio and prompt.
Actual Behavior
Frequently fails with one of the following:
Error 1
Can't compile non template nodes
Error 2
name 'TranscriptionRequest' is not defined
Error 3
Asking to pad but the tokenizer does not have a padding token.
Please select a token to use as `pad_token` ...
Error 4 (garbled output)
Output includes repeated [TOOL_CALLS] and unreadable token fragments (example):
[Modell: mistralai/Voxtral-Mini-3B-2507]
[Hinweis: Chat-Template fehlgeschlagen, Transkriptions-Fallback wurde verwendet.]
[TOOL_CALLS] [TOOL_CALLS] [TOOL_CALLS] ... ée ren ache tern ption ...
Additional Related Error Encountered Earlier
Before stabilizing env, importing transformers could fail due to torch/torchvision mismatch:
RuntimeError: operator torchvision::nms does not exist
ModuleNotFoundError: Could not import module 'AutoProcessor'
What Was Already Tried
- Removed vLLM and moved to Transformers-only backend.
- Lazy model import to avoid startup crash.
- Multiple fallback input-building paths.
- Added
librosa. - Set tokenizer/model
pad_tokenfallback. - Deterministic generation + output cleaning.
- Reset/reinstall env and torch stack cleanup/reinstall attempts.
Request
Please advise the correct and stable inference path for Voxtral on Transformers in this setup (Windows), specifically for:
- chat template compilation failure (
Can't compile non template nodes) - transcription request API consistency (
TranscriptionRequestpath) - avoiding garbled
[TOOL_CALLS]generation in fallback mode.
Hey @PierrunoYT - you shouldn't need to use the chat completion template for Voxtral Realtime. It's a pure transcription model, so doesn't accept a chat-completion style request. Just an audio input. Hence the use of the TranscriptionRequest.