xttsultravox
Mirror of fixie-ai/ultravox-v0_6-qwen-3-32b — speech-language multimodal model that ingests audio directly without a separate ASR stage.
Source attribution
- Upstream model:
fixie-ai/ultravox-v0_6-qwen-3-32b(Apache-2.0 audio adapter + Qwen 3 base) - Audio encoder:
openai/whisper-large-v3-turbo - LLM backbone:
Qwen/Qwen3-32B - Project: https://ultravox.ai
- Paper / code: https://github.com/fixie-ai/ultravox
Use
Serve with vLLM:
vllm serve amrhym/xttsultravox \
--trust-remote-code \
--tensor-parallel-size 4 \
--max-model-len 4096 \
--gpu-memory-utilization 0.83 \
--served-model-name xttsultravox
Then send audio via POST /v1/chat/completions with messages[].content[].audio_url.
- Downloads last month
- 14
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support