Idle
Voice Agent Pipeline Core stages from end-of-speech to first assistant audio
1
EOU Delay
Time from your last speech frame until the turn is considered complete. In this pipeline it already includes transcription wait.
Silence Detection
Deciding that your turn has ended
--
2
LLM TTFT
Thinking time: from LLM request start to the first generated token.
Thinking
Generating the first token
--
3
TTS TTFB
Voice generation startup only: time from TTS request start to the first audio chunk (TTFB).
Voice Generation
Starting audio synthesis
--
Total Round-Trip
End-to-End Latency
Total round-trip from end of user speech to first assistant audio: EOU delay + Thinking (LLM TTFT) + optional handoff + Voice Generation (TTS TTFB).
--