Open Voice Agent

Voice Agent Pipeline Core stages from end-of-speech to first assistant audio

Initial Response

EOU Delay

Turn Detection

Finalizing when your turn is complete

LLM TTFT

Thinking

Generating the first token

TTS TTFB

Voice Generation

Starting audio synthesis

Total Round-Trip

End-to-End Latency

Tool Calls Tool Call

Tool execution

Executing external function calls

No tools executed.

Post-Tool Response

LLM TTFT

Thinking

Generating the first token after tools

TTS TTFB

Voice Generation

Starting post-tool audio synthesis

Second Audio

Tool completion to post-tool assistant audio Tool→Audio

Total Turn Duration

Complete turn including tool execution and post-tool response