Spaces:

nvidia
/

voice-agent-examples

Running

App Files Files Community

voice-agent-examples / examples /voice_agent_multi_thread /DOCKER_DEPLOYMENT.md

fciannella

Added the healthcare example

2f49513 2 months ago

preview code

raw

history blame

7.52 kB

Docker Deployment - Multi-Threaded Voice Agent

Overview

This Docker container runs the complete multi-threaded telco voice agent stack:

LangGraph Server (langgraph dev) on port 2024
Pipecat Pipeline (FastAPI + WebRTC) on port 7860
React UI served at http://localhost:7860

Quick Start

Build the Image

# From project root
docker build -t voice-agent-multi-thread .

Run the Container

docker run -p 7860:7860 \
  -e RIVA_API_KEY=your_nvidia_api_key \
  -e NVIDIA_ASR_FUNCTION_ID=52b117d2-6c15-4cfa-a905-a67013bee409 \
  -e NVIDIA_TTS_FUNCTION_ID=4e813649-d5e4-4020-b2be-2b918396d19d \
  voice-agent-multi-thread

Access the UI

Open your browser to: http://localhost:7860

What Happens Inside the Container

The start.sh script orchestrates two processes:

1. LangGraph Server (Port 2024)

cd /app/examples/voice_agent_multi_thread/agents
uv run langgraph dev --no-browser --host 0.0.0.0 --port 2024

This runs the multi-threaded telco agent with:

Main thread for long operations
Secondary thread for interim queries
Store-based coordination

2. Pipecat Pipeline (Port 7860)

cd /app/examples/voice_agent_multi_thread
uv run pipeline.py

This runs the voice pipeline with:

WebRTC transport
RIVA ASR (speech-to-text)
LangGraphLLMService (multi-threaded routing)
RIVA TTS (text-to-speech)
React UI

Environment Variables

Required

# NVIDIA API Key for RIVA services
RIVA_API_KEY=nvapi-xxxxx

Optional

# LangGraph Configuration
LANGGRAPH_HOST=0.0.0.0
LANGGRAPH_PORT=2024
LANGGRAPH_ASSISTANT=telco-agent

# User Configuration
USER_EMAIL=user@example.com

# ASR Configuration
NVIDIA_ASR_FUNCTION_ID=52b117d2-6c15-4cfa-a905-a67013bee409
RIVA_ASR_LANGUAGE=en-US
RIVA_ASR_MODEL=parakeet-1.1b-en-US-asr-streaming-silero-vad-asr-bls-ensemble

# TTS Configuration
NVIDIA_TTS_FUNCTION_ID=4e813649-d5e4-4020-b2be-2b918396d19d
RIVA_TTS_VOICE_ID=Magpie-ZeroShot.Female-1
RIVA_TTS_MODEL=magpie_tts_ensemble-Magpie-ZeroShot
RIVA_TTS_LANGUAGE=en-US

# Zero-shot audio prompt (optional)
ZERO_SHOT_AUDIO_PROMPT_URL=https://github.com/your-repo/audio-prompt.wav

# Multi-threading (default: true)
ENABLE_MULTI_THREADING=true

# Debug
LANGGRAPH_DEBUG_STREAM=false

Docker Compose

Create docker-compose.yml:

version: '3.8'

services:
  voice-agent:
    build: .
    ports:
      - "7860:7860"
    environment:
      - RIVA_API_KEY=${RIVA_API_KEY}
      - NVIDIA_ASR_FUNCTION_ID=52b117d2-6c15-4cfa-a905-a67013bee409
      - NVIDIA_TTS_FUNCTION_ID=4e813649-d5e4-4020-b2be-2b918396d19d
      - USER_EMAIL=user@example.com
      - LANGGRAPH_ASSISTANT=telco-agent
      - ENABLE_MULTI_THREADING=true
    volumes:
      # Optional: mount .env file
      - ./examples/voice_agent_multi_thread/.env:/app/examples/voice_agent_multi_thread/.env:ro
      # Optional: persist audio recordings
      - ./audio_dumps:/app/examples/voice_agent_multi_thread/audio_dumps
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:7860/get_prompt"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 60s

Run with:

docker-compose up

Using .env File

Create .env in examples/voice_agent_multi_thread/:

# NVIDIA API Keys
RIVA_API_KEY=nvapi-xxxxx

# LangGraph
LANGGRAPH_ASSISTANT=telco-agent
LANGGRAPH_BASE_URL=http://127.0.0.1:2024

# User
USER_EMAIL=test@example.com

# ASR
NVIDIA_ASR_FUNCTION_ID=52b117d2-6c15-4cfa-a905-a67013bee409

# TTS
NVIDIA_TTS_FUNCTION_ID=4e813649-d5e4-4020-b2be-2b918396d19d
RIVA_TTS_VOICE_ID=Magpie-ZeroShot.Female-1

The start.sh script automatically loads this file.

Ports

Service	Internal Port	External Port	Purpose
LangGraph Server	2024	-	Agent runtime (internal only)
Pipecat Pipeline	7860	7860	WebRTC + HTTP API
React UI	-	7860	Served by pipeline

Note: Only port 7860 is exposed externally. LangGraph runs internally on 2024.

Healthcheck

The container includes a healthcheck that verifies the pipeline is responding:

curl -f http://localhost:7860/get_prompt

Check health status:

docker ps
# Look for "(healthy)" in STATUS column

Logs

View all logs:

docker logs -f <container-id>

You'll see both:

LangGraph server startup and agent logs
Pipeline startup and WebRTC connection logs

Testing Multi-Threading

Open UI: http://localhost:7860
Select Agent: Choose "Telco Agent"
Test Long Operation:
- Say: "Close my contract"
- Confirm: "Yes"
- Operation starts (50 seconds)
Test Secondary Thread:
- While waiting, say: "What's the status?"
- Agent responds with progress
- Say: "How much data do I have left?"
- Agent answers while main operation continues

Troubleshooting

Container won't start

# Check logs
docker logs <container-id>

# Common issues:
# 1. Missing RIVA_API_KEY
# 2. Port 7860 already in use
# 3. Insufficient memory

LangGraph not starting

# Check if agents directory exists
docker exec <container-id> ls -la /app/examples/voice_agent_multi_thread/agents

# Check langgraph.json
docker exec <container-id> cat /app/examples/voice_agent_multi_thread/agents/langgraph.json

Pipeline not responding

# Check pipeline logs
docker logs <container-id> 2>&1 | grep pipeline

# Check if port is accessible
curl http://localhost:7860/get_prompt

Multi-threading not working

# Verify env var
docker exec <container-id> env | grep MULTI_THREADING

# Check LangGraph server
docker exec <container-id> curl http://localhost:2024/assistants

Development Mode

To develop inside the container:

# Run with shell
docker run -it -p 7860:7860 \
  -v $(pwd)/examples/voice_agent_multi_thread:/app/examples/voice_agent_multi_thread \
  voice-agent-multi-thread /bin/bash

# Inside container:
cd /app/examples/voice_agent_multi_thread

# Start services manually
cd agents && uv run langgraph dev &
cd .. && uv run pipeline.py

Building for Production

Multi-stage optimization

The Dockerfile uses a multi-stage build:

ui-builder: Compiles React UI
python base: Installs Python dependencies
Final image: ~2GB (UI + Python + agents)

Reducing image size

# Use slim Python base (already done)
FROM python:3.12-slim

# Clean up build artifacts (already done)
RUN apt-get clean && rm -rf /var/lib/apt/lists/*

# Use uv for faster installs (already done)
RUN pip install uv

Security Considerations

Non-root user: Container runs as UID 1000
No secrets in image: Use environment variables or mount secrets
Read-only filesystem: UI dist is built at image time
Health checks: Automatic restart on failure

Performance

Startup time: ~30-60 seconds
Memory: ~2GB recommended
CPU: 2 cores minimum
Storage: ~3GB for image + runtime

Related Files

Dockerfile - Container definition
start.sh - Startup orchestration
agents/langgraph.json - Agent configuration
pipeline.py - Pipecat pipeline
langgraph_llm_service.py - Multi-threaded LLM service

Support

For issues:

Check logs: docker logs <container-id>
Verify environment variables
Test components individually (LangGraph, Pipeline)
Review PIPECAT_MULTI_THREADING.md for architecture details