Voice Agent Pipeline
Core stages from end-of-speech to first assistant audio
1
EOU Delay
Time from your last speech frame until the turn is considered complete. In this pipeline it already includes transcription wait.
Silence Detection
Deciding that your turn has ended
--
2
LLM TTFT
Thinking time: from LLM request start to the first generated token.
Thinking
Generating the first token
--
3
LLM to TTS Handoff
Residual orchestration gap between LLM output and TTS startup. It is shown only when this gap is measurable and greater than zero.
Pipeline Handoff
Bridging text generation to speech synthesis
--
3
TTS TTFB
Voice generation startup only: time from TTS request start to the first audio chunk (TTFB).
Voice Generation
Starting audio synthesis
--
Total Round-Trip
End-to-End Latency
Total round-trip from end of user speech to first assistant audio: EOU delay + Thinking (LLM TTFT) + optional handoff + Voice Generation (TTS TTFB).
--