xttsultravox

Mirror of fixie-ai/ultravox-v0_6-qwen-3-32b — speech-language multimodal model that ingests audio directly without a separate ASR stage.

Source attribution

Use

Serve with vLLM:

vllm serve amrhym/xttsultravox \
  --trust-remote-code \
  --tensor-parallel-size 4 \
  --max-model-len 4096 \
  --gpu-memory-utilization 0.83 \
  --served-model-name xttsultravox

Then send audio via POST /v1/chat/completions with messages[].content[].audio_url.

Downloads last month
14
Safetensors
Model size
0.7B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support